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content addressable memory. 

(57) A class of lossless data compression 
^ algorithms use a memory-based dictionary 
(312) of finite size to facilitate the compression 
and decompression of data. To reduce the loss 
in data compression caused by dictionary re- 
sets, a standby dictionary (328) is used to store 
a subset of encoded data entries previously 
stored in a current dictionary. In a second 
aspect of the invention, data is compres- 
sed/decompressed according to the address 
location of data entries contained within a dic- 
tionary built in a content addressable memory 
(CAM) (312). In a third aspect of the invention, 
the minimum memory/high compression 
capacity of the standby dictionary scheme is 
combined with the fast single-cycle per charac- 
ter encoding/decoding capacity of the CAM 
circuit. In a fourth aspect of the invention, a 
selective overwrite dictionary swapping tech- 
nique is used to allow all data entries to be used 
at all times for encoding character stnngs 
(450-472). 
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23. 1 T 9 h 92 aPPliCati ° n iS 3 continuation in Part of copending U.S. application Ser. No. 07/996,808 filed December 
BACKGROUND OF THE INVENTION 

This invention relates generally to data compression and decompression methods and anna™.,,.* a „H 

=S5b=Ss£=£33=S£~ 

l^2srrs w,a ' c r ent,y r idin9 in 0,6 dict,onary - Af ter each «- •»**. . ™ sztlis. 

rZ« " t ? CO T r " preSS '° n ' nfo «^ti°n are the first and second methods of Lempe. and Ziv L«ed l3lTd S 
respectively. These methods are disclosed in U.S. Patent No 4 464 650 to LI * 
provements in the algorithms are disclosed in U.S. Paten ^78^ ^^^^!?^^ 
et al. These references further explain the use-of dirtionaries ' 4 ' 814 * 746 *° 

When working on a practical implementation, the amount of memory available for compression/decom- 
press™ is f ,n.te. Therefore, the number of entries in the dictionary is finite and the enoThT^ ™h T 
used to encode the entries is bounded. Typically, the length ^£^2^^£ZSZ& 

attifpo^^^ 

at tn,s point. For example, the dictionary can be frozen in its current state, and used for the remainder 
Z a S D TT,: " d aPPr0aCh> ^ diCti ° nary ^ reS6t and a — ^To^Zi^^nl 
tiolyTest " ^ ^ "** ** borates Z the dic 

deterioration in compression ratio will occur. B s ' 800 a rap,a 
A dictionary reset method maintains the learning capability of the algorithm but suffers from » tom 

mU ™«.«,n. „ ,h. ACM. Januar, ,992. Vo, 35. No. , . Entte dictionary JZ™££?£Z!2£Z 
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dictionary entry at a time. The least recentiy used (LRU) code is selected and ^^^^^ 
inmit character string The Bunton. et. al. method improves the compression ratio but has the disadvantage 
1^ a^iona, bits for each dictionary entry to identify LRU status. Add,t,ona. bits 

for each dictionary entry result in significantly increased hardware costs. ,... Mru memorv 

One method for reducing the number of required dictionary resets is to increase the dictionary memo y 
size increased memory size, however, increases cost and can increase the time required to search dictionary 
S tZ Tin present LRU tracking methods become less practica. 

Another bottleneck to compression/decompression performance is the amount of time required to search 
the SSSZr^p***** encountered character strings. Traditionally, hashing algorithms are used to 
s^^e^-storJdMowy entries and to locate available memory locations for new character 
str g s?y P Sai 

as disclosed in U.S. Patent No. 4,558,302 to Welch (LZW). 

The hashing algorithm maps each unique dictionary entry into the RAM space at an ^res« based on 

some siml arithmetic function of the data word contents. Since such an algorithm uses the entire word or 
SZ S the word to calculate the mapping address, more than one data word might map to the same £ 

ca font memory. causing a hashing collision. In this case, an alternative location must be found for the 

fn^tablv as the RAM locations fill up, a second dictionary entry will hash to a previously-used location. Th.s 
Tu«i; beZll before compression can continue. Hashing ^^Sfl^SS!^ 
Msions, add considerable complexity to the compression/decompress^ system logic, and reduce system 

thr °TvDSly the dictionary based upon the data being compressed will be a small subset of all possible data 
entr^ So^ ^e method for reducing hashing collisions is to increase the number of dictionary storage 

with the compression/decompression control logic. In addition, a larger memory could mcrease the search time 
reauired to determine if a character string has previously been loaded into memory. 

'"nothe? bottleneck to data compression/decompression is the amount of time and 
quired to encode and decode data character strings. For example, during data compression a^ ^^^an 
string is found not to match any of the data phrases previously stored with.n memory ,t must be stored in , an 
un^upted data memory location. A codeword must be generated that uniquely identifies the ^ 
Sand subphrases wKhin a character string that previously matched dicttonary data entnes. The c^eword 
must then be stored so that it can be combined with additional characters during further data compression op- 

erati During data decompression, a compressed data codeword may represent an ^^^^^^ 
and an additional codeword, for example, a link to the rest of the uncompressed ^«™~*Z~^ 
Hewlett-Packard Journal, June 1989. pp. 27-31. The described HP-DC scheme encodes 
S and stores the codewords (OMEGA) concatenated wrth a next bvte (K) at «*™2^££Z 
determined by a compressed code. Therefore, the dictionary must be read several times before the actual de- 
compassed data string is generated. Since the compressing and decompressing process ,s rterative, any ad- 

pression and decompression time. Present encoding, decoding, and dictionary search methods, ^^re- 
oXmore than one clock cycle to compress or decompress each input character. In addition, these encoding 
and decoding algorithms require complex compression and decompression hardware. cwctem<t 
Ac^rdingly there is a need for improving the performance of dictionary-based data compression systems 
and Sroving the encoding and decoding of data in a dictionary-based data compression/decompression 
system. 

SUMMARY OF THE INVENTION 

It is, therefore, an object of the invention to minimize the loss in data compression created when the dic- 
tionarv in a dictionary-based data compression system is reset „«,» Qme f„ r 

A second object of the invention is to increase the adaptation properties of data compression systems for 
input data sequences with changing statistical characteristics. .. . „ haraPter 

Another object of the invention is to reduce the amount of time required to encode/decode a character 
strinq in a dictionary-based data compression/decompression system. h-.-HH«t«««i. 

Another object of the invention is to maximize data compression capacity ,n a dictionary-based data com- 
nrpctcinn/decomoression system with a minimal amount of memory. 

P a! ;^dZaTo5cTofSe invention is to minimize the amount of hardware and time required to select.ve.y 
update a dictionary-based data compression/decompression system. 
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One aspect of the invention is a data compression/decompression system that simultaneously builds a cur- 
rent dictionary and a standby dictionary. The current dictionary serves the same purpose as the dictionary in 
a standard data compression engine. The standby dictionary is built in parallel with the current dictionary, so 
as to contain a subset of the phrases of the current dictionary. This subset is chosen to best characterize the 
patterns occurring in the source data. When the current dictionary fills-up, it is replaced by the standby dic- 
tionary, and a new standby dictionary is built -from scratch" as the new current dictionary continues to be built 
and used for compression. Therefore, the compressor never switches to an empty dictionary, and the deteri- 
oration in data compression caused by having a limited dictionary memory size is reduced. 

The current dictionary starts with sufficient empty space to add new data entries thereby allowing con- 
tinued adaptation to the source data. This feature is of paramount importance in compressing source data with 
varying statistics. Although some information is lost by switching to a smaller number of data entries in the 
standby dictionary, the time to rebuild the dictionary to maximum efficiency is still less than a complete dic- 
tionary reset Therefore, a smaller dictionary memory can be used with less negative impact on the data com- 
pression ratio. 

The criteria for selecting the subset of the current dictionary that goes into the standby dictionary can vary 
depending upon the specific application. For example, an encoded data string is copied to the standby dic- 
tionary if it has been matched at least once with a data entry in the current dictionary. Alternatively, the entries 
in the standby dictionary can be selected according to string length, most recent data entry matches, or any 
criterion that identifies entries that maximize compression in a given application. 

In addition, the criteria for switching (resetting) from the current to standby dictionary can be changed de- 
pending on the type of data or application. For example, the current dictionary can be reset when it is filled 
with valid data entries. In the alternative, the current dictionary can be reset when using it for compression 
falls below a predetermined performance threshold, as described in U.S. Pat 4,847,619 to Kato et al. 

In a second application of the standby dictionary, mainly in situations where the data characteristics are 
stationary, the compressor makes two passes at the data. In the first pass, the compressor scans a large sam- 
ple of the data. The sample is large enough to cause the current dictionary to fill up many times, thereby causing 
the standby dictionary to replace the current dictionary a proportional number of times. At each dictionary 
switch, the current dictionary is "refined" until, after several iterations, the algorithm has built a dictionary 
strongly customized to the data sample. The customized dictionary is then set as the sole dictionary reference 
used by the compression engine during a second pass to compress the input data. The customized dictionary 
thereby performs significantly better than a single dynamic dictionary for the same data. 

A second aspect of the invention is a dictionary-based compression/decompression system architecture 
and method which utilizes the address values of stored data entries in the dictionary of a compression/decom- 
pression system to simplify encoding as well as decoding circuitry. The system preferably uses a content ad- 
dressable memory (CAM) with additional logic circuitry including local feedback circuitry to provide special 
functions that speed up memory access and simplify external compression/decompression logic. The memory 
structure has unique features, that can provide lossless data compression or decompression at a sustained 
rate of one character per clock cycle without hashing or potential for hashing collisions. 

Specifically, the system preferably comprises an associative memory that encodes character strings ac- 
cording to the address locations of data entries contained within the memory. An input character string com- 
bination which has not previously occurred within the input data stream is stored as a new data entry within 
the dictionary. The CAM is organized into "words" which each store a unique character string data entry. The 
memory performs an associative parallel search with an input character string with selected bits in a "word," 
on all words previously stored in the dictionary. In the event of a match, a match line associated with the data 
entry is activated. Ail the match lines are then encoded into a single codeword representing the character string. 
The codeword is then combined with the next input character and again compared with the data entries pre- 
viously stored in memory. Thus, character strings are assigned codewords according to their address locations 
in memory. When a search fails, the codeword (OMEGA) representing the last previously-matched character 
string (e.g., its address) is output and another search is started with a new character string starting with the 
character (K) that caused the match to fail. The compressed data character (codeword) is a pointer to a data 
entry in the dictionary. Therefore, character strings are decoded by using the compressed data character as 
an address into the decompression dictionary. For example, initially, an external compressed character is used 
as an address into the dictionary. The data entry at the decoded address location is then read. If the data entry 
output from memory does not require further decompression (e.g., the memory output is the "root" of a linked 
list) then the data entry is output. If the data entry contains another codeword (e.g., a further encoded link to 
another dictionary address location), then the character at that address is output and the codeword at that ad- 
dress is fed back to memory as the next dictionary address. 

An internal address generator is used for both compression and decompression and resets coincident with 
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a memory reset Any write to the memory (an explicit write or a failed match) w. I result m the , address incre- 
menting to the next address. Incrementing need not be sequential but may to far example. P"^*™- 
as long m both compressor and decompressor address generators are initiated to the same 
ment in the same way, with the result that both compression and decompression d,ct.onanes w.ll be ,dent>caL 
This ogic eliminates^ need for generating7storing addresses in externa, control log,c. anc .car .result . .ra- 
cked compression decompression performance (e.g.. fewer clock cycles and faster daU |""P^* 

To f urtherreduce the time required for data compression, special update crcuitry allows a memory search 
and a data write to be performed during the same clock cycle. When a character string .s compared with the 
data entr es wrthin memory, a failed search requires the string to be stored as a new data entry. The next ava,l- 
£e SdSsrcation is Jeady known from the address generator and the character string ,u already residing 
aithe roemory data input. Therefore, control logic can be used to automatically write the character stnng .nto 
memo^no match occurs during the search. Thus, the memory is automatically updated during , the memory 
search clock cycle. If a match is found during the search operation, the update c.rcu.try prevents the character 
strina from being loaded into memory as a valid data entry. 

?he system and method summarized above thereby provides a simple, inexpensive, and versatile system 
for fast compression and decompression of data. It can be implemented in software on general purpose com- 
puter or in hardware using custom or semicustom integrated circuitry. The system and method can be used 
Sc ^!mp°ement storage/retrieval of linked list data structures. And it can be readily adapted to vanous adapfve 

d 'ThetS^ 

dictionary scheme with the fast single-cycle per character encoding/decoding capacity of the CAM circuit The 
causes muitiple dictionaries within the storage locations of a CAM circuit. The CAM <^ J"^™ 
oressed and uncompressed character strings and stores them as data entnes .nto one of the dictionaries. Co- 
ords reprinting each data character string are then generated according to the address of the dictionary 
data entry that matches the character string. 

To support multiple dictionaries, each memory location in the CAM contains a status f .eld and a .date , f .e d. 
The data field stores data entries and the status field indicates which dictionary is assigned to that :6ato , .ntay- 
During a search operation, the circuit can mask certain bits of both the status f .eld and the d^O» 
allows the system to determine which dictionary is assigned to a data entry and to determme .f certa.n memory 
locations are not currently assigned to a dictionary. 

Dictonary assignments for each data entry are easily switched by changing the state of *e «»^ 
sion/decomprlsioncircuitBy^ 

iocations previous* assigned to that dictionary to now constitute free storage locations no >^W^ 
any dictionary. These free storage locations are now available for storing new character stnngs . The state 
changes can be triggered by different events to maximize the compression ratio and the adaptability of the 
sysTen dif erenUypes oi data. For example, the compression/decompression circuit can automat.caHy 
Tange stetes when one of the dictionaries becomes full or alternatively change states when the compress.on 
ratio falls below a predetermined performance level. 

To further increase the compression ratio of the compression/decompress-on system, 
Ziv compression/decompression system (LZSD2) is utilized to selectively replace individual data entnes n*rth 
new character strings The L2SD2 priority system allows the use of all dictionary entnes for stnng matching 
TXSEZZStL* the above'descLd Standby Dictionary methodology Therefore^ ^ »wo ^s are 
needed to identify the next overwrite location, regardless of dictionary s,ze. Dictionanes 
being updated without negatively affecting the data compression rate s.nce each date entry 
to a Sicttonary after a dictionary reset Implementation can also be performed with the same compression/de- 
compression hardware as described above without negatively impacting the data compression raU . 

^provide a single clock cycle search capability, the compression/decompression circuit constructs a 
standby dictionary in parallel with a current dictionary and searches multiple dictionanes at th* i samet me. 

The foregoing and other objects, features and advantages of the invent.cn w.ll become ^more read, y ap- 
parent fromThe following detailed description of a preferred embodiment which proceeds wrth reference to the 
drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a data flow diagram for a data compression system with current and standby dictionaries in ac- 
™«™^™^ now d . agram iDustrating one exarnple forthe standby dictionary data selection 
process of FIG. 1. 

5 
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FIG. 3 is a block diagram of an example of a data compression circuitry implementing current and standby 
dictionaries according to the invention. 

FIG. 4 is a high level block diagram showing a data compression/decompression system embodying the 
present invention. * 

FIG. 5 is a detailed block diagram of the memory and control logic circuitry of FIG. 4. 

FIG. 6 is a logic diagram of the auto-update circuitry within the address decoder of FIG. 5. 

FIG. 7 is a generalized data flow diagram for the method of data compression/decompression using a con- 
tent addressable memory (CAM) according to the invention. 

FIG. 8 is a detailed data flow diagram for the data compression procedure of FIG. 7. 

FIG. 9 is a detailed data flow diagram for the data decompression procedure of FIG. 7. 

FIG. 10 is a graphical depiction of the compression and decompression procedures in FIGS 8 and 9 

FIG. 1 1 is a block diagram showing a CAM designed for use in a multi-dictionary compression/decompres- 
sion system according to the invention. 

FIG. 12 shows the different fields contained within each dictionary entry in the CAM shown in FIG. 11. 

FIG. 13 shows the dictionary values for each compression/decompression state in the ST field of FIG. 11 
systen? " illUStrateS the State transition changes for the CAM multi-dictionary compression/decompression 

FIG. 1 5 is a logic diagram illustrating a simple hardware implementation for changing compressor/ decom- 
pressor states. 

FIG. 1 6 is a detailed circuit diagram of the main components for the CAM multi-dictionary compression/de- 
compression system shown in FIG. 11. 

FIG. 1 7 is a detailed circuit diagram of a ST pattern generator. 

FIG. 1 8 is a data flow diagram showing the general method for data compression using a CAM with a stand- 
by dictionary. 

FIG. 19 is a data flow diagram showing the general method for data decompression using a CAM with a 
standby dictionary. 

FIG. 20 is a graphical depiction of the compression and decompression methods in FIGS. 18 and 19. 
FIG. 21 is a graph showing the compression results for the CAM multi-dictionary system and for a standard 
LZW compression scheme. 

FIGS. 22A-22E are graphical depictions of a second Lempel-Ziv Standby Dictionary (LZSD2) compression 
method. 

FIG. 23 is a data flow diagram showing the general method for performing L2SD2 compression. 
FIGS. 24A, 24B and 24C are a detailed data flow diagram for the procedure shown in FIG. 23. 
FIG. 25 is a data flow diagram showing the general method for a LZSD2 decompression method. 
FIGS. 26A, 26B and 26C are a detailed data flow diagram for the procedure shown in FIG. 25. 

DETAILED DESCRIPTION 

In the following description, the first and second sections separately describe the standby dictionary and 
content addressable memory aspects of the invention. The third section describes a combined implementation 
of the first two aspects of the invention. The fourth section describes an alternative method of operation using 
the system described in the third section. 
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I. Data Compression/Decompression System Using A Standby Dictionary 

FIG. 1 is a data flow diagram for a data compression/decompression system with current and standby dic- 
tionaries. The method illustrated in FIG. 1 begins at block 8 with initialization of both the current dictionary 
(CD) and the standby dictionary (SD). For example, codewords representing every single character possible 
in the uncompressed input data are put into the dictionaries. Alternatively, the initial dictionaries could be emp- 
ty. The encoding of character strings from the data sequence is implemented using any desired encodino 
scheme. a 

In block 10, input data is compared with previously encoded data entries of a current dictionary to deter- 
mine whether the character string and any of the dictionary data entries match. Block 12 stores an unmatched 
character string as a new encoded data entry in the current dictionary. When a match can no longer be ex- 
tended, the code for the longest matched string is output at block 13. 

Block 14 stores a subset of the previously encoded data entries of the current dictionary (CD) in the standby 
dictionary (SD). The subset selection process in block 14, as stated above, is alterable for specific input data 
to produce the highest compression ratio with a given number of data entries in the standby dictionary. For 

6 
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exantpte. dau. en**, for the standby denary can »'^>Xs"^ 

r;Th=e^ 

Ss7il standby denary, reads a new chance ^"^T "'"'^ ^ £ useli M ge „„,„ , 
A dictionary basad repression/decompress™ ™^*»f ™ ^ 21* * »» Input 

static customed current dictionary that is used to ^S^njclng L current die- 

data sequence Is selected. The current dawy ls °"S^rS Is TSrf in a rid-onl, function 

torary with the standby ^^^X Z^S^^^" "» " aB ^" e "° S '- 
and used by the compression engine exclusively for compress 'r" cnmnression algorithm that utilizes 

. FIG.2isadeteileddateflowdiag ra millustrat.ngon^ 

a current and standby dictionary. HO. 2, ^J^^^^S^-^«^ 

implemented according to specif ic application W™"*"*' dictionary , n bl0 ck 20. Decision block 22 

, . SSTlS — — * "^irrr^cuonar, is no. W WO* 28 

Decision Mock 24 detarmlnes If the current dret onary rs fill Jf the «n» t om y llcbos 

stores th. data shing as a data er*ry in the current ^^ "^^^"^ ™ current director,. 

in block 20 is repeated. hetw „ n the inDut data and an entry in the current 

When decision block 22 determines there ,s a ™£*^££Z^ been stored into the standby 
v dictionary,decisionblock36che^ 

directory. If the data string has not been previously copied in* the ^.J^. block 36 . The 

field within the current dictionary. A*™ 1 ^* f f ^Indicates to the 

flag is associated with the current dictionary data ent y ^"^££^5 directory. This prevents 
compression engine that the data entry has *™ 0 f^°^X^T^l *e date string into the 

« circuft (IC) 50 which is a present* P refe "^j^ 

includesadata compression/decompression *™™j!*™™*"*™^Z wm n random access memory 
54. The DCD IC 50 is used in combmation with dictionary 1^2S52o^hai.ln is conveniently imple- 
(RAM) 88 and dictionary 2 (D2) ^^^Jl^J^Zl £EE as RAMs but can be con- 
mented in a single IC or as separate ICs 50 ^^J™^.^ memory structure. The RAM is con- 

98?ScL.y. and a standby status field (stdby.stat) 92 ^ standby stat us 

Thfdata entry field stores unique data ^nnfls occumng m the input date sequ ^^^ d * HonBry 
field indudes a standby dictionary status flag that ^^^^^^^^^ include a 
S5 has previously been stored in ^^^S^^TSS^Z vaTd Sid in a data compression 
dict.valid field for identifying valid MnrtM ^^00^ entHled~DICTIOMARY RESET PERFOR- 

r CE^C^ S ^ N0 " 07/76M75 ' f i,iR9 ^ 
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9/25/91, and is incorporated by reference (EP-A-0534713) 
twice encoded data string from being copied into the standby dictionary 

. hi»h »„, ^ „, prevloudy e ^,TC"i ■ c """ an ' """"*" 

Original file size: 6,602,300 bytes 
Unix compress: 2,781 ,686 bytes 
Customized dictionary: 2,025,742 bytes 
Compression improvement: 37% 

metnods 6 ' *" CUSt ° mi2ed diCtionar V "^stantial impression improvement over prior cornpression 
This aspect of the invention can be modified in arrangement and detail without departing from its basic 
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mentation as described below. 

II. Memory Circuit For Lossless Data Compression/De compression Dictionary Storage 

fig a te . block diagram showing the overall arrangement of a circuit 136 for a CAM compression/decom- 

thatar. output on data bus 1 58 » data b^jMM- ™« e " lo the CD engine 142 «ia 

rrTSr^nSnSi r w tzi - ss. sr«s » «- «•«»* «» 

data buffer 150. String table in ^ 00 ^ buffer 140 A microprocessor (not shown) con- 

,« ng S'otelllry comprises an «M. "ray in .ha torn, of a content-a(WreasableRAM 

,„pu, ,64. both fmm ccntr* looic ,46 (FIG. 4). and a mafch = a, »put ,68 ftom „ „ 
Memory 188 provides a set of mach stsnals »,a match l™?*"*"*^ „ " "'bus 190 matches one 
"a^ionaia^^^^ 

Cnr*sTn:S^X «8. in turn —Jjj-J ""J™ 

TA bus 1 58 (FIG. 4). The internal aaaress ge reaa Vwrite and reset signal come from 

search signal 178, a read/wnte s.gnal 164. and a reset signal v bz. . ne r 

control logic 146 (FIG. 4). The address generator includes a counter which is reset le.g., t v 
s zation and subsequently r incrementec I as .the ^JJJ^ J," J M ^ by read select ,ogic 172 and the 
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compression status can be determined by the value of data entry characters. For example, values 

! B °^ e P ' exe ' 176 MU * 1) se,ects an '"P" 1 from eit "er bus 177 orbus 186 and multiplexer 174 (MUX2) 
selects between the output of mux 1 76 and the output of address generator 1 70. Decoder 184 also indudS 

184 E ne V5£ d w5™ ^JS* im P |emente tion Somatic update feature of address decoder 
and aa ic Ln t on« S (/ ^ DRN I 19: °l) in P ut into ^dress decoder 184 from MUX2 174 is fed into two 
^hh? ^ 21 k i,,UStrate 8 Sing ' e address ,ine - ™ D 9 ate 208 is -»« search 

and fcd^S^iZ^SL T - ? 10 ? ° f matCh Si9na ' 1 68 (RG - 5) - ThS S6arCh Signal is also 
twn?Mn J ♦ AND gate 214 along w.th a "qualified" write signal. OR gate 212 receives the outputs from the 

sroCn^ 

searl^StinaTdJf/^f ' ^ * Perf ° rmed during data ^pression operation. If a 

m^rt ^ ♦ 1 com P ress,on ' t" 6 string must be placed into the next available address in 

™Z ? a 2 ZTaZll f t a rti on 1 ST cyde necessary to write the data word into memOT * «*» a ^ 

»nri »1 208 f 9Oe f h,gh lf a match does not °ccur. Since the character string is already on data bus 190 

ta^!^Tj7 T " eXt aVailaWe ■*""" iS a,read y set * address aerator 170, a write can be per- 
formed immediately after a match indication occurs. Thus, the inverted match signal NOMATCH activates gate 
208 acting the word line (WORDN) associated with the next available memory location 9 

If a match is found during the search operation, the word select line is disabled and no write operation takes 
Place. The qualified write signal is used to force data writes even when no match oc^rTrme^orrforex- 
ample, dunng an externa, microprocessor write operation. This update feature provides true 3e pe^tfe 
performance s.nce dictionary writes are "transparent," not requiring an extra memory access * 

the ^stlt'nr'r' CifCUit 'I" 6 - 6 may be USBd *° S6t 8 " data -^lid" field within memory. For example, 
he system ,n FIG. 5 can copy each new character string into memory prior to checking for a match in memory 

storTd dttfsrring 0 ^' Si9nal " * 3 ™ "* ^ 

Data Compression 

o JI^Th" 0 " °! ° irCUit 14 l' D ? r °° m P Tession > a microprocessor (not shown) initializes the system for oom- 
reset signal 162) come from the uncompressed data interface 152 via control logic 146 (FIG 4) The reset line 

oene^f » I! ^ aSSOC,ated 7 "* """^ ' OCati ° n - addition ' * e reset ,ina initializes the address 
generator to a starting memory location for storing character strings 

«* JSJZT L d,ff ! ren ! teC ^ iqUe f bB US6d f ° r initiali2 i n 9 single input characters. For example, single input 
c^ a u,« y k a,90nthm,Ca "y encoded as P art * *e compressed data stream. Alternatively, a set of en- 
coded values each representing any single input data character may be loaded into memory 

dJ^^Sl a* T ™ * COn ~ et the addrSSS pr ° vided b * address Senerator 170 to ad- 

dress decoder 184. An external character string from uncompressed data interface 138 (FIG. 4) is supplied to 

■£22T£ ( " ^.T 1 ^ COd6WOrd ' i8 ' d (DATAJN I19:8]) ° f buS 190 - Sea * n sf 9 a ' ie "'en 
ma Tl Z u 9 n,em ° ry 188 10 CO,npare the codew °rd/byte string with each location in memory 188. No 
match will .nitiajy occur since nothing has been previously written into memory 1 88. Therefore the codeword 
/byte stnng on bus 190 is written into the first available address location in memory 188 (e. .SS 

ch^^r 6 ^^'^ 8 9enerat ° r 1?0) - Addr6SS 9enerator 170 is the n incremented a^d a new'S 
characterfrom bus 180 ,s read into the bytef ield of the memory data input The process is repeated. oZSSa 
to wnte unmatched codeword/byte strings into memory 1 88. continuing 

from^n^tof "? atch ' in P utda ! a select logic 182 directs multiplexer 192 to place the codeword generated 

^S^^^fT^JZ? ° f bUS 1 90 (DATAJN [19:8]) - A new external character from bus 
180 is then fed into the byte field (DATAJN r7:0]) of data bus 190. The codeword thereby represents the pre- 
fer! IT ^ h 'J? 0 *' BeCaUSe the C ° deWOrd assigned to the chara <*er string is derived 2££ 
Ir^arirflt >i "latched data entry address, significantly less control logic is required to encode input characters 
I . add,t,on, by feeding the codeword back into multiplexer 192 (MUX 3) and combining the codeword with the 
next input character, an input character can be processed each clock cycle 

ron 0 I h ^ ne ^-. C ° deW0 . rd u /by ! [ e Strin9 fe the " n™*™* with the data entries within memory 188. The process is 
repeated unt.l no match « found. At this point, the compressor outputs the codeword from the last mS^nd 
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writes the new codeword/byte string into memory 188. The last input character <K) fed .nto the byte field* 
men compared with the upoated dictionary (in the case of dictionary initialized to contam W codeword^ 
b sVSng memory for the byte K paired with a null codeword, thereby generating a root codewort t .corn- 
ice aTew string A new external character (K) from bus 180 is then fed into the byte f.eld and the matah 
process is repeated building on a new string (per LZW). AKernatively, the last character K can be .output fol- 
Eg OMEGA (as in LZ2) or the address of K can be output as the codeword for K foHowing OMEGA. 

When fhe d ctionary fills up. address generator 170 activates a table-full signal 196 that .nd-cates t the 
rest of the compression system (FIG. 4) that no further character strings can be written , ntt ) memory. Any ad- 
ditional input data is then compressed according to the present entnes stored wrth.n memory 188. 



Data Decompression 

For data decompression, in circuit 144 the operation starts by resetting memory 188 and initializing the 
circuft for compressing input data. Decompression involves linked-list de ~ m ^ s ^ 
a compressed data address may simply refer to an address in memory where the decompressed data .string 
fsSS (e g a -root" codeword of a linked list). The address however may have a "non-roof codeword (e.g 
hi SoVwoVd is a link to the next address required to further decompress the encoded character stnng). As 
men^rabove W and "non-root" codewords can be determined in a variety of ways. For example, by 
the value of the codeword or in the alternative with an identifier bit within memory 

When the compressed data interface 148 (FIG. 4) has compressed data available ,t .s written to decoder 
184 on external address bus 177. After receiving a "non-root" codeword, the memory .s read, and (assuming 
ItLE Son the byte field (DATA.OUT [7:0]) of bus 186 is pushed onto a UFO stack J"-*^ *J 
146 (FIG 4). The codeword field (DATA.OUT [19:8]) of bus 186, if a non-root codeword, .s fed back to address 
decoder 1 84 via MUX1 and MUX2 and another memory read is performed. Prior to the non-root codeword fee^' 
SacTthe last byte of the data entry read from memory is pushed into FIF0 140. This process terrn.nates when 
— in a W codeword, at which time a new codeword is read from externa, address bus 

177 After a root codeword is identified, the last decoded character output is concatenate 
external encoded character and read into the next available address in memory 188. Read I seled Hog c 172 
crtis or Toot" codewords and directs multiplexer 176 accordingly to connect externa [^^l™" 
the DATA OUT bus 1 86 back into address decoder 1 84. Read select circuit 1 72 also suppl.es » a . coded element 
signage Tcontro. logic 146 to indicate a completely decompressed codeword. F.FO 140 then dumps the 
decomDressed decoded characters on bus 154. ... i ^^a is e t 

T^e system in FIG. 5 simplifies the decompression operation. Since decompress-on involves .linked lis 
traverse, Cbuilt-in logic provider feedback of the memory output data bac* into the address de^^^^ 
addrt ona, nteraction w'h externa, decompression logic (FIG. 4). Therefore, each decompress™ cycle , w,H re- 
quireTess time and the decompression control logic is simplified. There are a number of d.fferent .mp^ernen- 
te ions for -qualifying" valid words and codewords in memory 1 88. One method is to use a comparator scheme 
■ and anSer is to use an extra, resettle bit for each word. The technique used is dependent upon specie 
aoolStion requirements. In a unidirectional system (e.g.. CDROM). the decompress.on crcu.t can be further 
sSted Zg a^onventional RAM with feedback circuitry as described above for linked list traversal. 

FIG 7 is a data flow diagram showing the general method for data compression/decompress.on or I.nked 
list storage/retrieval in a system according to the above-described aspect of the .nvent.cn. The method .llu- 
sfratd befow ndapt^e such that the dictionary is embedded in the codewords and thereby does not need 
to ^n£££ separately with the compressed data. Alternative methods, for example, where the denary 
is transferred with the compressed data, can also be implemented using the present system. 

Dashed btock 232 is the compression process and dashed block 234 is the decompress.on process ; for 
the system Compressed data <K) at input 224 is supplied to decision block 226 alo* , wrththe coded char 
acter string (OMEGA) output from block 228. As noted above, OMEGA represents an address of a data entry 
enidlng a character string OMEGA and K are concatenated together and compared 
The dictionarv in decision block 226. If the OMEGA-K input matches an entry .n memory, block 228 encodes 
he £^tnV«5nU of the matched data entry. This encoded value (new OMEGA) is then fed back^ con- 
c r n a£ TZl tte next externa, character K and input into decision b.ock 226. This process >» ™" 
an OMEGA-K string does notmatch any entries within memory. Block 230 then ^ates the stnng teb '^ m " 
Z the OMEGA-K string, outputs OMEGA, and feeds the character K into cod.ng 
inliock 228 and concatenated with the next externa, data character K before be.ng fed back .nto decis.on block 

226 The encoded data, OMEGA, is sent to block 236 for decompression. A given encoded input character 
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(OMEGA(i)) is used as an address for accessing the string table memory. Decision block 238 determines if 
the data entry at the address OMEGA(i) is a root character. If it is, there are no additional encoded characters 
in the data entry output from memory (e.g., OMEGA(j) does not exist). The memory data entry for K is then 
output as a decompressed output character on line 246. Decision block 238 jumps to block 240 where the pre- 
vious encoded character (OMEGA(M)) is concatenated with K and written into the next available memory ad- 
dress location. Block 242 then directs block 236 to use the next encoded character (OMEGA(i+1)) in the input 
stream as the address location for the next data entry read from memory. 

If the output from the string table memory is not a root (e.g., the output comprises an encoded character 
(OMEGAfl) and a decoded character K), K is output on line 246 and decision block 238 jumps to block 244. 
Block 244 uses the encoded character (OMEGA(j)) as the address for the next data entry output from memory. 
The data entry at memory location OMEGAfl) is then processed as described above. The process is repeated 
until every encoded input character is decompressed. 

FIG. 8 is a detailed data flow diagram of dashed block 232 in FIG. 7. The data compression process begins 
when a start or reset signal is instigated in block 248. A memory circuit (described below), is initialized in block 
250, for example, to operate in the compression or decompression mode and to reset the dictionary. Any dic- 
tionary valid bits need to be initialized, preferably in parallel. The dictionary may be initialized either with single 
character codewords or with a set of codewords externally generated in accordance with a selected coding 
algorithm, such as LZW disclosed in Welch U.S. Pat. No. 4,558,302 or DCLZ disclosed in the ECMA-151 Stan- 
dard, paired with a null codeword to identify the entry as a single character or "root" codeword. Alternatively, 
rather than pre-storing a set of codewords, they could be generated real time each time a match fails, for ex- 
ample, as disclosed in commonly-assigned U.S. Pat. No. 5,142,282, on Data Compression Dictionary Access 
Minimization. Other initialization schemes can be used, including an empty dictionary. 

The first character in an input data stream is read in block 252 and either stored directly in the OMEGA 
field or encoded (e.g., CODE(CHAR)) then stored in the OMEGA field. Then, the next input character (K) in 
the input data stream is read in block 256. Block 258 shows a process which combines OMEGA and K together 
as a character string (i.e. concatenates OMEGA-K) and then searches the dictionary for a data entry that 
matches the OMEGA-K string. Since no data string has yet been stored in the dictionary, decision block 260 
indicates that there is no match. Since the OMEGA-K string is not presently represented, it is stored in memory 
if decision block 266 determines there is available storage space. If the memory is not full, the operation in 
block 268 automatically loads the OMEGA-K string into the next available memory storage location (AD DR(N)). 
Block 270 then increments an address counter to identify the next available storage location in memory 
(ADDR(N+1)). The encoded value OMEGA (an address) for the first input character, if applicable, is output as 
the first character in the encoded data string in block 272. 

When the memory is full, the compression system can simply be disabled from writing any additional char- 
acter strings into memory. For example, if decision block 266 determines that the memory is full, the character 
string loading step of block 268 and the address counter incrementing step of block 270 are skipped and the 
process jumps to the encoding and output process of block 272, further described below. 

After OMEGA is output, the step of block 274 replaces the first input character (OMEGA) with the second 
input character (K) or code(K). The next input character from the input data stream is then read (K) thereby 
providing the next OMEGA-K string. The process then loops back to block 258 where the memory is searched 
with the new OMEGA-K string. 

If a match is indicated by decision block 260, the process jumps to block 264 where the OMEGA field is 
replaced with an encoded value representing the OMEGA-K string, which is equal to the match address. The 
next input characterfrom the data stream is then copied into the Kf ield. The OMEGAand Kf ields are combined, 
forming a new OMEGA-K string which now represents three input characters. The process returns to block 
258 where dictionary data entries are compared with the new character string. Additional input characters are 
added to the character string as long as the previous character string matches a data entry in memory. When 
a new character string no longer matches a data entry, decision block 260 jumps to block 266 where the mem- 
ory update procedures of blocks 266, 268, and 270 are performed as described above. Block 272 outputs the 
value OMEGA (e.g., the encoded character string from the last input character string/data entry match). Block 
274 takes the last character in the character string (e.g., the character that caused the character string to no 
longer match any data entry in the string table) and copies it into the OMEGAf ield. Block 274 then copies the 
next input characterfrom the input data stream into the K field and the process loops back to block 258. The 
character string is thereby compressed since the single encoded value of OMEGAoutputfrom the compression 
process represents multiple input characters. 

FIG. 9 is a detailed flow diagram of decompression circuit 246 in FIG. 7. Block 276 initializes the string 
table memory for decompression. Block 278 gets the first encoded word (OLDWORD). If no more data is avail- 
able during this or any subsequent input read step, then the process is exited. The first encoded word is de- 
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coded at block 280, either algorithmically or by reading a preloaded entry in the string table memory. The first 
encoded word is a root character and is therefore decoded and output. 

Block 282 gets the next encoded word (INCODE) and block 284 uses INCODE as the address of the data 
entry output by the string table. Initially, in one implementation, the string table will consist of only single char- 
acter bytes, so block 284 will output a byte K. Byte K is then output in block 286. In later cycles, block 284 will 
return OMEGA-K as discussed further below. 

Decision block 288 determines whether the byte is the end of a string (e.g., root character) and, if so, jumps 
to block 292 Block 292 builds a new data entry in the next available address in the string table which consists 
of the concatenation of the first encoded input word (OLDCODE) and the last byte output (K). Block 294 points 
to the next unused address location and block 296 replaces OLDWORD with the last encoded input word (IN- 
CODE) and returns to block 282. 

Block 282 reads the next encoded input word (INCODE) and block 284 outputs the data entry at the ad- 
dress INCODE. If the data entry output at address INCODE is not a root, it will include a decoded byte K and 
a codeword field pointing to a next address for further decoding (OMEGA). Block 286 will then output K and 
decision block 288 will jump to block 290. Block 290 uses the codeword field (OMEGA) as the address of the 
next data entry output from the string table and then loops back to block 284. The process is repeated untrt 
the data entry output from the string table contains a root character (i.e., is the end of a string). Decision block 
288 then proceeds to block 292 where the previously read encoded word (OLDCODE) is concatenated with 
the last output byte (K). The functions in blocks 294 and 296 are then performed and then the process returns 
to block 282. Thus, the decompression process regenerates the original data stream compressed in the com- 
pression process of FIG. 8. e a mo a 

FIG 10 is a graphical depiction of the compression and decompression algorithms in FIGS. 8 and 9. a raw 
data stream 300 comprises an uncompressed string of characters which are input to the data compression/de- 
compression process illustrated in FIG. 7. In this example, single characters R,I,N, and T have been loaded 
during initialization into locations ADDR0, ADDR1, ADDR2. and ADDR3 of memory 302A respectively. Input 
characters are encoded by assigning each character the value of its address location, however, to increase 
compression speed, single input characters can be encoded algorithmically prior to initiating the process de- 
scribed below. Memory 302A illustrates the dictionary in its state immediately after initialization and memory 
302B illustrates the dictionary after compression is complete. nnm 
The first input character R, from data stream 300. matches the data entry at address location ADDR1. 
Since there was a match, the compression system concatenates the encoded value for R (AddrO = 0) with the 
next input character "I", and memory 302A is searched for a "01" match. Because there is no -or match in mem- 
ory "01" is written into the next available memory location (ADDR4), as illustrated in memory 302B. The co- 
deword forthe largest matched sequence (i.e.. the codeword for "R" = 0) is output as the first encoded character 
in compressed character stream 304. The compression system now searches memory 302B forthe string com- 
prising the encoded value for "I" (i.e.. ADDR1=1) concatenated with the next input character "N". Since the 
string "IN" is not in the dictionary, it is written into the next available memory location (ADDR5). as shown in 
302B. The value 1 (e.g., last matched character string = "I") is output as the second encoded character in com- 
pressed character stream 304. 

The process continues to built memory 302B and encode input characters in a similar manner until the 
second T in the uncompressed character stream 300 is processed (e.g., character 306). The compression 
system encodes "I" with the value 1. since "I" is located at address location ADDR1. The encoded value 1 is 
concatenated with the next input character N and the string "1 N" is compared with the data entnes in memory 
302B Since the sequence "1N" has occurred previously in character stream 300. the string "1 N" matches an 
entry in memory (e.g.. data entry at Addr5). The string "IN" is therefore encoded as "5" and concatenated with 
the next input character T\ Since the string "5T does not match any entry in memory 302B. "5T" is written 
into the next available address location (ADDR8) and the codeword for the last matched character stnng 5 
is output in character stream 304. The encoded value for input character T (ADDR3=3) is then concatenated 
with the next input character "I" and the process is repeated. Memory 302B shows all characters built for the 
dictionary from character stream 300. Character stream 304 is the complete compressed character stream 
for raw data stream 300. Notice that only six encoded characters are required to represent the nine characters 

in character stream 300. . 

The decompressor dictionary is reinitialized for decompression as illustrated in 302C so the first four ad- 
dress locations contain the decoded values for the single input characters R. I, N, and T respectively. Again, 
single character decoding may also be preformed algorithmically. The first encoded input character 0 is used 
as an address into memory 302C. The decompression system determines that the value "0" is a root codeword 
for example, by checking that the value is less than 4. The date entry at ADDRO (e.g.. "R" ) ta thereby output 
as the first character in decompressed character stream 308. The decompression system then reads the next 
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encoded input character "1". This value is again a root codeword and therefore the data entry at ADDR1 is 
output as the second character T in decompressed character stream 308. . 

At this point, a new dictionary entry is built using the last decompressed character "I" concatenated with 
the previous codeword "0". The string "01" is then written into the next available address location (ADDR4), as 
shown in memory 302D. The next codeword "2" is input and the process is repeated. This time the data entry 
at address location ADDR2 (e.g., N) is output and then the string "1 N" is written into memory at address location 
ADDR5. 

The process is repeated in the same manner until input character "5" is read by the decompression system. 
The decompression engine uses this codeword to reference the data entry at ADDR5. The encoded character 
"5" is not a root since it is greater than three, therefore, the decompression system outputs the last byte of the 
data entry at address location ADDR5 (e.g., "NT). The rest of the data entry (e.g., "1") is used as the next ad- 
dress. Since the codeword n r is a root, the data entry at ADDR1 (e.g., "I") is output and no further decom- 
pression is required. The decompressed characters "IN" are then placed in character stream 308. A new data 
entry in memory is written into address location ADDR7 using the last decompressed output character "I" and 
the previous encoded input character "3". The process is repeated until all characters in character stream 304 
are decompressed. It will be noted that dictionaries built using the HP-DC scheme with hashing are different. 
In contrast, the compression and decompression dictionaries 302B and 302D built by the present system and 
method have identical addresses/entries. 

III. Using Multiple Dictionaries in a CAM Compression/Decompression System 

To further reduce the amount of memory required to compress data using a CAM, the CAM data compres- 
sion system previously illustrated in FIG. 5 is used in conjunction with a standby dictionary (see FIG. 3). The 
CAM, while having the capacity to process one character each clock cycle, can now compress data using mini- 
mal memory. In addition, the data compression ratio is increased by maintaining a useful set of character 
strings in the current dictionary after a reset. The method illustrated below is adaptive whereby the dictionary 
is embedded in the codewords so that a separate dictionary does not have to be transferred before each de- 
compression process. 

FIG. 11 is a high level block diagram of the combined CAM multi-dictionary compression/decompression 
system. For illustrative purposes, the system is implemented using a 2 b x (b + m +2) CAM 312 similar to that 
illustrated in FIG. 5. The CAM 312 comprises a control bus 314 coupled to a control processor (not shown). 
An address bus 316 (b-bits wide) and a data bus 318 (n-bits wide) are coupled to CAM 312. The zero bits of 
a n-bit wide DATA_MASK bus 320 disable the corresponding bits during a CAM search. For example, a "0" 
signal on the first mask bit (DATA_MASK[0]) disables the first DATAJN bit (DATAJNp)]) fed into CAM 312. A 
disabled DATAJN bit is not taken into account when searching CAM 312 for a data entry that matches the 
signal on line 318. Data masking circuits are well known in the art. Therefore, the details of the masking circuit 
used in CAM 312 will not be shown in detail. A match success line 322 goes active whenever the data on bus 
318 matches a previously stored entry in CAM 312. MATCH_ADDRESS bus 326 contains the address of a 
matched data entry and DATA_OUT line 324 is used to output data entries previously stored in CAM 312. 

FIG. 12 shows the different fields contained within each dictionary entry in the CAM. Each CAM data entry 
has three fields: a character field (CHAR) which is m-bits wide for storing the suffix character K, a code field 
(CODE) b-bits wide for storing the encoded character value OMEGA, and a status field (ST) two-bits wide for 
storing the dictionary status bits for the associated CODE and CHAR fields. The status field (ST) takes one 
of four possible values as follows: 

FREE: The CAM memory location is presently unused in the current dictionary; 

CD: The CAM location contains a data entry that belongs to the current dictionary, but not to the standby 
dictionary; 

SD: The CAM location contains a data entry that belongs to both the current and standby dictionaries; 
and 

INV: Invalid value, should not occur in normal operation. 

The binary values corresponding to FREE, CD, SD and INV are not fixed. The compressor and decom- 
pressor operate as state machines that can be in any one of four possible states (S), where 0 ^ S ^ 3. The 
specific binary values for the status field (ST) are functions FREE(S), CD(S), SD(S), and INV(S) of the state 
(S) and are defined in FIG. 13. For example, in state S=0, if the bits [0:0] exist in the status field of a CAM 
data entry, that memory location is FREE and regarded as not presently being used in the current dictionary. 
If the compressor/decompressor system is in state S=2, however, a CAM location with bit values f0:0] in its 
status field is regarded as a data entry that has been assigned to the standby dictionary. 

Initially, the system is in state S=0, and all the ST fields are set to [0:0] (e.g., ST=FREE(S)). This is the 
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only time a global initialization is necessary, as will be explained further below, min.mtz.ng the nrtdarton 
time delay that would occur during subsequent dictionary resets. The compressor, initially in state S=0. starts 
reading input characters, compressing input strings and building, in parallel, the current dictionary (CD) and 
standby dictionary (SD). When the CD becomes full, a dictionary switch occurs whereby the ,.f^" e ' L' 
the SD become the new data entries in the CD. The SD is essentially emptied, removing al valid date entries. 

The dictionary switch occurs when the system makes the state transition S=0 -» S=1 Referring to FIG 
13 in state S=1, the free entries are those with ST=[1:0], which is the same as the CD value in state S-0 In 
steteS=1 CDentriesa rem osewithST=[1:1],whichismesame 

occurs only when the CD becomes full, so all entries in the CAM will either be marked CD or SD (,.e no entries 
in the status field with a FREE value and the value INV is never written). Therefore, ,mmed,ately after the state 
transition from S=0 to S=1. all entries have the value FREE or CD (with the exception of the initial single-char- 
acter strings, which are notactually kept in dictionary memory, as will be explained below). There are noentr.es 
with the value SD. so the new SD starts empty. A similar situation occurs in the state transitions; S=1-*s-z 
S=2->.S=3, and S=3-+S=0. FIG. 14 illustrates the state transition changes forthe compression/decompression 

system as described above. , .„ too Xho 

FIG 1 5 illustrates a simple hardware implementation for changing compressor/decompressor states. The 
initial bi values of a status register 28 are illustrated at state S=0. For each state transition the b. ts in the 
status register shift cyclically so that; FREE-* INV-+ SD-* CD-+ FREE. Thus state control is 
mented using an 8-bit cyclic shift register and shifting register 328 two bits to the left for each state change^ 
" describing the CAM-based standby dictionary compressor, the contents of a CAM memory location are 
denoted by a triplet (ST.CODE.CHAR). and code(A) represents the encoded value for a single-character string 
-A" For description purposes, codewords are assigned values corresponding to memory address location^ 
However, codeword values are also easily derived as simple functions of memory address locations and would 
be easily implemented by one skilled in the art. It is assumed that the codes (code(A)) within a predefined ad- 
dress space (e.g. addresses 0 to 2">-1) are immediately available without need to access the dictionary. As 
explained previously (see FIG. 5), the memory locations corresponding to these codes do not need ^P^ s, °f ^ 
exist in the CAM. Therefore, it is assumed that all CAM searches exclude these locations. For simplicity, end 
of file" conditions are also ignored. 

Implementation of the CAM Based Multi-Dictionary System 

FIG 1 6 is a detailed circuit diagram of the CAM-based multi-dictionary compressior^deMmpression sys- 
tem The circuit diagram in FIG. 16 illustrates the additional functional components necessary to provide mulu- 
Sonary cm preiio ^decompression. The CAM compression/decompression circuit ^2 « the same as that 
illustrated in FIG. 11 and the status register 328 is the same as that illustrated prev.ous y n HO. 1BL ^ATAJN 
register 342 and a MASK register 350 feed the ST. CODE, and CHAR fields through the DATAJN Ian I MASK 
ports respectively of the CAM 312. The ST f ield for each data entry in the CAM « controlled directly through 
me status register 328 or indirectly through a ST pattern generator 338. The ST pattern generator is illustrated 

de The specif ic CD and SD lines feeding the DATAJN port are controlled through a multiplexer 340 (MUX 
M1) by manipulating a control bus 314. The signals on control bus 314 come from a system processor (not 
shown) and control compression/decompression f unctions within CAM 312 Control bus314Mn / ^*^ r 
read write, search, and reset signals as previously illustrated in FIG. 5. The .nternal compressor/decompressor 
control logic within CAM 312 is also similar to that illustrated in FIG. 5. Minor modifications to this logic may 
be required toimplement some of the specific features described below. These crcurt modifications are easily 

implemented by one skilled in the art and are therefore not illustrated in detail. 

AHne 326 couples the MATCH.ADDRESS port of CAM 312 to the DATAJN port of CAM 312. An external 
data bus 344 is coupled directly to the ADDRESSJN port and coupled to the DATAJN port through register 
342TsTpatterngene ra ^ 

a multiplexer 346 (MUX M2). A search type signal on line 349 and various other control signals from a control 
generation circuit 352 are controlled by the MATCH J5UCCESS signal on line 322. The DAT A.O 
Hne 324 outputs compressed or decompressed data to data interfaces as shown in R<3 ; 4. to internal address 
^ter 354 (NEXT_CODE) can write date to a second address pointer 356 (SAVE.CODE) or can receive data 

,r ° m F rG e £E a dltai^n diagram of the ST pattern generator 338 from FIG. 1J The first « from , the 
CD field and the SD field of status register 328 (FIG. 16) are input to an AND gate 358 and an EXCLUS VE- 
NOR oate 362 The second bit from the CD and SD fields are coupled to AND gate 360 and an EXCLUSIVE- 
nSr gate S'.^e A^D gates feed the STfield of the CAM DATAJN port and the EXCLUSIVE-NOR gates 
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feed the ST field of the CAM MASK port (FIG. 16). 

The compression/decompression system must be able to search for both CD and SD dictionary entries 
simultaneously (as discussed in detail below). This is performed by manipulating the bits in status register 328 
FIG. 1 6). One of the bits in the CD and the SD will always match and the second bit will always be different 
(see FIG. 13). Thus, the matching bit is used to search for a valid SD or CD dictionary entry and the second 
bit is masked out For example, in state S=0, the bit values for the current dictionary CD are [10] and the bit 
values for the standby dictionary are [1:1]. This drives the outputs of AND gate 358 and EXCLUSIVE-NOR 
gate 362 high and drives the outputs of AND gate 360 and EXCLUSIVE-NOR gate 364 low. Therefore any ST 
field in the CAM dictionary with a "1" located in its first bit position (e.g. CD(S) or SD(S)). is identified as a 
valid dictionary entry of either the CD or SD. 

Data Compression 

The system in FIG. 1 6 compresses data in the following manner. The system is set to state S=0 by loading 
the status register 328 with bit values as illustrated in FIG. 15. All ST fields in the CAM dictionary are set to 
ST=FREE(S) (i.e., [0:0]) and the address pointer NEXT.CODE is set to the first available address in the CAM 
As discussed previously for the CAM illustrated in FIG. 5. the single input characters can be encoded algorith- 
mically dunng data compression, in which case all the CAM addresses are available for storing character 
stnngs. If single data characters are stored in the CAM, however, the first available address for writing an en- 
code character string will typically be the address location after the last single character location 

If necessary, a first input character is encoded by reading the first data character from input data line 344 
and generating the address for the input character/data entry match on line 326. The encoded first character 
(OMEGA) is then concatenated in register 342 with a second input character (K) from input data line 344 to 
generate an OMEGA.K character string. A search is performed in the CAM for a data entry that matches the 
OMEGA, K string. At the same time, the ST field is searched for a CD or SD value that matches the value gen- 
erated by ST pattern generator 338 (e.g. an OMEGA.K string that has already been stored as a CD orSD entry) 
All bits of SEARCH_TYPE signal 349 take a value "1" when searching for a match, which enable the CODE 
and CHAR fields of the CAM mask. MUX M1 and MUX M2 select the ST fields for the MASK and DATA IN 
ports respectively from the ST pattern generator 338 as previously illustrated in FIG. 1 7 

Since this is the first OM EGA K string fed into the CAM, the MATCH.SUCCESS signal on line 322 indicates 
no match. In turn, OMEGA is output on line 324 and the character string CD(S), OMEGA, K is written into the 
ST. CODE, and CHAR fields respectively at CAM dictionary location NEXT_CODE. The character K of the 
OMEGA.K string is then encoded (code(K)) and used as the new value for OMEGA. The CD(S) value written 
into the ST field is supplied directly from register 328 by altering the input of MUX 340 which feeds into the 
ST field of register 342. 

The system then searches for the next available CAM dictionary entry (e.g. ST=FREE(S)) Accordingly 
the SEARCH.TYPE signal 349 takes the value "0", masking out the CODE and CHAR fields and enabling the' 
ST field via the [1:1] bit values on line 348. At the same time, control line 314 coupled to MUX 340 selects the 
value FREE from register 328 as the value searched in the ST field. The match address from line 326 is used 
as the NEXT.CODE for storing the next unique OMEGA.K string. The process extracts the next character from 
the input data string on line 344 and concatenates it with OMEGA, generating the OMEGA.K string for the next 
search. If a match is found on the next search, the address location of the match is fed back into the DATA IN 
port on MATCH_ADDRESS line 326 for the next match attempt. This address is used as the new OMEGA value 
representing the previous OMEGA.K string. At the same time, the SD(S) value from register 328 is written into 
the ST field at the match address. 

As described above, after a new OMEGA.K string is written into a CAM location, a search is performed 
to find the next FREE value in the status field. Afailed search indicates the current directory is full and causes 
the system to switch into state S=1 . This is performed by rotating the contents of register 328 two bit positions 
to the left. The status field locations previously having SD(S) values now constitute CD(S) values Because 
all the status fields in the CAM had been set to either CD(S) or SD(S) in state S=0. (e.g. no FREE status field 
values exist just prior to the state change), all FREE memory locations in state S=1 will be previous CD(S) 
entnes from state S=0. In addition, the standby directory will be empty except possibly for the initial single- 
character stnngs in state S=1 since the INV value is never written in state S=0. Compression continues as 
descnbed above with the system in state S=1. This process continues generating compressed data characters 
and switching states until all of the input data is compressed. 
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Data Decompression 

Data decompression using the system in FIG. 16 is performed in the following manner. The CAM is initial- 
ized by resetting the bits in register 328 to state S=0. The FREE bit values are written into the status field of 
each available memory dictionary location. The internal address pointer 354 (NEXT_CODE) is set to the first 
available memory location in the CAM (e.g. NEXT.CODE = 2") and internal address pointer 356 
(SAVE.CODE) is set to zero. 

Decompression is performed in the same manner as described above in FIG. 5. For example, the first en- 
coded character from the encoded character string (OMEGA) is read on line 344. OMEGA is then used as the 
address fed into the CAM ADDRESSJN port. If the value of the CODE field output on line 324 is not a "root", 
the CHAR field is output on line 324 and the CODE field is fed back into the CAM as the next address location. 
This process is repeated until a "root" CODE field is read out from the CAM. 

After a compressed input character (OMEGA) has been decompressed and the decompressed character 
string output on line 324, the STf ield at address location OMEGA is set to SD(S). This is performed by writing 
the SD(S) value from register 328 into the ST field of register 342. The dictionary is then built by feeding back 
the first character (K) from the decompressed data string into the CHAR field of register 342 at address location 
SAVE_CODE. The CD(S) value from status register 328, the OMEGA value originally read over line 344, and 
the first character from the decompressed OMEGA output string (K) are written into the ST, CODE, and CHAR 
fields of the CAM dictionary at address location NEXT.CODE. The value of address pointer NEXT_CODE is 
then written into address pointer SAVE_CODE. A "0" value is placed on line 349 and the [1:1] bit values on 
line 348 allow a "status field only" search. The next dictionary entry in the CAM with a FREE status field is 
then found by searching the ST fields for a FREE value. The address value of the FREE status field is written 
into address pointer NEXT.CODE over line 326. The next encode character OMEGA is then read from line 
344 

If the current dictionary is full (e.g. no FREE status field values exist), the system is switched to state S=1 
by shifting the bits in register 328 as described above and the value of address pointer NEXT.CODE is reset 
The current dictionary will therefore only contain entries from the previous standby dictionary. The system then 
reads the next encoded character (OMEGA) from line 344 and the data decompression process is continued. 

FIG 1 8 is a data flow diagram showing the general method for data compression using a CAM with a stand- 
by dictionary. Block 376 is an initialization process that sets the state and status conditions for the system. 
: Specifically, the system is set to state S=0. all status registers in the CAM dictionary are set to ST-FREE(S). 
and the address pointer is set for the next available address in the CAM (e.g. 2"> — NEXT.CODE). 

A first character from an input data stream (CHAR- K) is read in block 378 and encoded (e.g., code(K)) 
to provide the value OMEGA. The next input character (K) in the input data stream is read in block 380. Block 
382 combines OMEGA, and K together as a character string (e.g. concatenates OMEGA, K). A search is then 
conducted that not only looks for a data entry matching the OMEGA.K string but that also matches one of two 
alternate status register patterns (ST=CD(S) or ST=SD(S)). The search must lookfor both current and standby 
values since either value indicates a valid character string (e.g. OMEGA.K) has been written into the CAM. 
For example, a status register value ST=CD(S) indicates that the associated CODE and CHAR fields have 
been previously loaded with an OMEGA.K character string during the present process state. A status register 
value ST=SD(S) indicates the associated CODE and CHAR fields have been loaded with an OMEGA.K char- 
acter string and have matched at least once in the present processor state with a second OMEGA.K character 
string. Thus, both status register values (CD(S) and SD(S)) indicate valid CAM data entries that should not be 

^tfnodata string has yet been stored in the CAM. decision block 384 indicates that there is no match. The 
encoded value OMEGA is output as the first character in the encoded data string in block 388. The OMEGA.K 
string is written into the first available CAM address location (NEXT.CODE). The status field (ST) at the ad- 
dress location NEXT.CODE is written with the value CD(S) indicating a valid data entry in the CAM. Block 
388 then replaces OMEGA with the encoded value of the second input character (code(K)— OMEGA) . 

Block 390 searches the CAM for the next available address location with ST=FREE(S). If a status register 
with a FREE(S) value is not found, the current dictionary in the CAM is full. Decision block 392 thereby replaces 
the current directory (CD) with the standby directory (SD) by changing the CAM into its next state S=S+1 mod 
4 During a state change, the values of each status register are reassigned as previously described (see FIG. 
13) The ST field values are reassigned as follows; FREE- INV- SD- CD- FREE. The process returns to 
block 380. where the next input character (K) is read. The matching process is then repeated as described 
above. If the current dictionary is not full, decision block 392 jumps to block 394. Block 394 determines the 
next address in the CAM having a FREE status register value and assigns that address to NEXT.CODE (e.g. 
match address- NEXT.CODE). The process returns to block 380. where the next input character (K) is read. 
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If atc h is indicated by decision block 384, the process jumps to block 386 where the OMEGA field is 
replac, vitt- fhe address of the dictionary entry that matched the OMEGA, K string. The matched character 
string tr ; . presented by the CODE and CHAR fields at the match address) are automatically assigned to the 
standby directory by setting the status field ST at the match address to SD(S), The process then returns to 
block 380 where the next input character (K) is read from the data stream. The match address (OMEGA) and 
K are concatenated to form a new OMEGA.K string which now represents three input characters. Block 382 
then searches the current and standby dictionaries for a character string match. 

Actual building of the standby dictionary is done in block 386 when the ST field, at the address of the OME- 
GA, K character string match, is set to SD(S). This will often be an "overkill", since this location might already 
have been marked with SD(S). However, not having to check the ST field makes for simpler hardware imple- 
mentation. 

FIG. 19 is a data flow diagram showing the general method for data decompression using a CAM with a 
standby dictionary. Block 398 initializes the system to state (S=0) and initializes the ST field for alt available 
dictionary entries to a value ST=FREE(0). The address pointer NEXT_CODE is set to the first free address 
location (NEXT_CODE=2 m +1 ) and a second address pointer SAVE_CODE is set to zero. The first coded char- 
acter from the compressed data string (OMEGA) is read in block 400. 

Block 401 decompresses OMEGA into a decompressed character string W as described above in FIG. 1 6. 
For example, by using OMEGA as an address, the CHAR field at memory location OMEGA is output by the 
CAM. If the CODE field from address OMEGA is not a "root", it is used as the next address fed into the CAM. 
The CHAR field for the next address is then output as the next decompressed character K. If the CODE field 
at address OMEGA is a "root", the CHAR field at address OMEGA is output and the CODE, CHAR fields at 
address OMEGA are assigned to the standby dictionary (e.g., SD(SH>ST). Block 402 assigns the first char- 
acter of character string W to a register C. 

If the address pointer SAVE_CODE is not zero, decision block 403 jumps to block 404 where the dictionary 
is built by writing the character string CD(S), SAVE_CODE,C into the CAM dictionary at acidress ionst.or 
(NEXT_CODE). If SAVE_CODE is equal to zero or after block 404 has written the character string, hioci; 405 
assigns the status field at address location OMEGA to the standby dictionary (SD(S) -> (OMEGA)) and re- 
places the present SAVE_CODE value with the value of OMEGA. Block 406 searches for the next status field 
with a value ST=FREE(S). If a FREE ST field is located decision block 408 jumps to block 41 0 where the match 
address is assigned to address pointer NEXT_CODE (e.g. MATCH_ADD-> NEXT_CODE). The process then 
returns to block 400 where the next encoded character from the compressed data stream (OMEGA) is read 
and decompressed. 

If no ST field has a FREE(S) value, decision block 408 jumps to block 412. The process is then changed 
to the next state causing the current dictionary to be switched with the standby dictionary (i.e., S=S+1 mod 
4). This also causes the current dictionary entries from the previous state to become FREE locations. Block 
413 searches for the next free location with ST=FREE(S), resets the value of address pointer SAVE.CODE 
to zero, and jumps to block 410. Block 410 assigns the address value of the FREE location located in block 
413 to address pointer NEXT_CODE. Block 410 then returns to block 400 where the process continues until 
all the data from the compressed data stream is decompressed. 

FIG. 20 is a graphical depiction of the compression and decompression algorithms in FIGS. 18 and 19. A 
raw data stream 414 comprises an uncompressed string of characters which are input to the CAM compression 
process illustrated in FIG. 18. In this example, single characters R,l, N, and T have been loaded during initial- 
ization into locations ADDR0, ADDR1. ADDR2, ADDR3 of memory 416 respectively. Single-character inputs 
are encoded by assigning each character the value of its address location, however, to increase compression 
speed, single-input characters can be encoded algorithmically prior to initiating the process described below. 
Memory 416 illustrates the dictionary in state S=0 immediately after initialization and memory 418 illustrates 
the dictionary in state S=0 immediately before replacing the current dictionary with the standby dictionary (e.g. 
changing from state S=0 to state S=1). Memory 420 illustrates the dictionary in state S=2 after compressing 
raw data stream 414. 

Each memory location in the dictionary is separated into a status field (ST), a code field (CODE), and a 
character field (CHAR). For illustration purposes, it is assumed that there are only 5 dictionary locations in the 
CAM available for storing character strings (e.g. ADDR4-ADDR8). Address locations ADDR0-ADDR3 are des- 
ignated for single characters and are not searched as available dictionary locations. The bits of each status 
field are initialized to a value FREE=[0:0] (e.g., FREE(S)=0) and an address pointer NEXTCODE is initialized 
to the first available CAM memory location (NEXT_CODE = ADDR4). 

Thef irst input character "R", from rawdata stream 414, matches the data entry at address location ADDRO, 
and is used as the first value of OMEGA (e.g. OMEGA =0). The compression system concatenates OMEGA 
with the next input character "I", (OMEGA,K ) and searches for a "01" match in the CODE and CHAR fields in 
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memory 416. Atthe same time, the corresponding STf ield in memory 416 ^.^^^ ^•^SJSS 
•10" or "11" (e g CD(S) or SD(S) in state S=0). All memory locations are FREE and no 01 stnngl has been 
p 1 relio,y^n 

as the first character in a compressed stream 422 and the character stnng (CD(S). OMEGA, K) is wntten ^.nto 
m^TrstTREE memory location (ADDR4). The character T is then encoded: generatmg the next value for 

^ rjAM diethyls searched for the next ST field with a FREE value and that address location is as- 
sianS to the address pointer NEXT_CODE (e.g. NEXT_CODE=5). The next *™««<™*%™* a * 
Seam K=»S") is then concatenated with OMEGA (OMEGA="1") and the CAM is searched for the 1N char- 
acte^tnng. AgaTn no match will occur in the CAM and the character string (CD(S). 1J J> is wn«en mte address 
location ADDR5. The process is repeated in the same manner wnting into the ST, CODE and CHAR f .elds of 
the next available address after a character string is found not to match any previous entries. 
Thef^ 

th* raw data stream 414 Characters "IN" comprise the encoded OMEGA. K character stnng ( 1N ) Since tne 
ttrtlSSl^MM location ADDR5 are -1" and "N" respectively, and the .status field was ^ 
v^us'sete^ 

<OMEGA=5) and the data entry at ADDR5 is assigned to the standby d.ctionary (ST-SD(S )-[1 .1]for S 0)^ The 
n^fc^raierT; read from the raw data stream 414 and concatenated with °MEGAJhe new OMEGA* 
string ("5T") which now represents three characters, is then searched as previously desenbed. No OMEGA^K 
st ring wKh lvalue "5T- exists in the CAM, so it is written into the next available address location jADDR8). 

35 «5?l^^-^ CAM immediately after writing the character string ^Jjo£ 

, dress Sctfon ADDR8. The process searches memory 418 for the next FREE status f .eld. Assuming ADDR8 
TZ astavai^ble Vocation h the CAM current dictionary, no FREE status field is found. Th» .nd.cates that 
SSZZXS^* full and the system is accordingly changed to state 8-1. In state S= . the 
SJ^mS^onrtU*. a FREE memory location, and bit values [1:1] constitute a current 
E£l HQ 13> Therefore all dictionary locations in the current dictionary in state S=1, except the character 

, - iC?i2L2SSS. are available for storing character strings. WKh a state change, the address po.nter 

o NEXT CODE is reset to the first FREE memory location (NEXT_CODE— 4). 

ReSmnq o memory 420 in state 8-1 , the next input character 426 ("I") is then extracted from raw char- 
acte^ ^re^U and concatenated with OMEGA for the next OMEGA.K search ("31"). The stnng "3." res.des 
fn memorvTocauon aS)R7 however, the status field at that location is now FREE. Therefore, no match ,s 
LTanJ Srln^ded v Jue "3" is output as character 438 in compressed character stream 422. The char- 

* S'X^STorSL. into memory location ADDR4 and the character T is encoded as the next 

CODE=IrNote that address location ADDR5 is skipped because its status field indicates a current 

40 character Sjrun. A match occurs at address location ADDR5 and therefore OMEGA •^ss'gned the 
m^ add'e^ value, and the status f ield at address location ADDR5 is assigned to *e standby dictionary 
^it ass^ment for the standby dictionary in state S=1 are [0:1] (See FIG. 13). The next "P*^"** 
Inm raw date stream 414 is concatenated with OMEGAand the search process is repeated. The process con- 
i^oCjZ— of the system each time the current dictionary "fills up" until all the characters from 

45 ^mIC:™ 

pression Memory 434 illustrates the system in state S=0 immediately before changing from state ^S-0 £ £ate 
sTmwkJ?43B illustrates the data entries in state S=1 after decompressing the compressed character 
ftrlan^4^ The diSary in memory 432 is initialized so that the first four address locator* contain the de- 
S^^S^l^dJLmu R. I, N. and T respectively. Again, single character decoding may 
^Zo^UoL^ The system is set to state S=0 and all dicti ^^ff^JT^ 
FREE(S) The address pointer NEXTCODE is set to the first ava.lable dictionary location (ADDR4) and the 

^T^l-^^^ — • hereby OMEGA is used as .e address pointer inte 
memorv^T^e first input code from compressed character stream 422 constitutes an OMEGA value (OME- 
gS] lie decompTessL system determines that the value "0" is aroot codeword, for example, bychedang 
fhtt Se^atieTle^than 4. The data entry at ADDRO (e.g. "R") is thereby output as the f .rst character ,n the 
dec^npreted cSLter stream 430. The status field at address location OMEGA is then set to SD(S). 
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into Jhap ! T U J1 y T'" 9 th8 f "* CharaCter K f rom the decompressed codeword (e.g. "R") back 
TrfnS* %T \ d ° f 8ddreSS IOCati ° n SAVE - C °DE- this case, "R" is rewritten into the CHAR f W?of 

^^OJ^S?*' Stri " 9 (CD(S) ' °' R) fe Written int0 address tocation NEXT CODE (eg ADDR4) 
anfAVE.CODEisset^^^ 

is then assigned the value of the nextfree address in memory 434 (e.g. NEXT_CODE=5) 

OMEGA » decompressed and the decoded character T is output as the next character in decompressed char- 
acter stream 430. The ST field of address ADDR1 is set to SD(S) (e.g. [1:1]) and the f£s7c^arac?r from the 
decompressed OMEGA value (T) is written into the CHAR field alad'dre J ocatior SAVE SSdeJSdr^ 
The character stnng (CD(S), 1. 1) is then written into the ST. CODE, and CHAR fields of memory li mpet 
s;%C0 a DE eS The ^rI^ 9 ; ^ ^ ^ °' * — - the ne^vaTe^ 

(WE^ODeJ) ' ^ ' OCated ^ NEXT - C0DE set to Address 

strea^422 ^ SIS? 3 """"^ mann8r f ° f enCOded charactere "2" *"d -3' from compressed character 
stream 422 The first "5" from compressed character stream 422 is the first non-root code word and the data 

theneSm^ 8 " *! ^ the CODE f ield * ADDR5 ^££22 

» h^r i ?^ ^ ^ CAM - out P ut at ADDR1 (T) along with the previous CHAR field "N" 

ontsTed JS; • T d ST fi6ld 81 ADDR5 iS Set to SD <°>" ^ f irat the decom- 

pressed codeword p-) ,s wntten into the CHAR field of memory location ADDR7 (e.g. SAVE CODE=7) the 

f^™?" 9 (CD(S) ' 5 ' 0 iS Writte " into CAM ,ocation NEXT_CODE (e.g. ADDR8), and me value of 
SAVE CODE is set to the value of NEXTCODE (e.g. SAVE_CODE=8). Memory 434 shows the sta J of thJ 
current d.ctionary immediately after writing this character string into memory 

sta Js-r andT^ 11103 ' 63 ?° T" ^ 3 FREE Therefore ' the is swi ^ed into 

state S-1 and he status regeter values are reassigned as illustrated in FIG. 13. Referring to memory 436 

AOO^ D T^lfofr DE iS aSSi9ned ^ f ^ FREE mem ° ry ' OCati0n AdSreT^attns 
A^D^DD^a"^ ♦ entneS in the CUrr6nt direCt ° ry whi,e address locations A ^DR4, and 

^tl^SlSSSSl^ 81346 S=1 - Character43 8^ compressed character stream 

fSTJ?. OMEGA(OMEGA=3) and decompressed in the new decompression state 8-1 . The decoded char- 
S£T 1 ' ndecom P ressed character stream 430 and the ST field at ADDR3 is assigned the value 
St AD?R« *** f !S- ? AVE - CODE P° int8 to so the character "T" is written'nto the CHAR 

SAVE cSdp m mem ° ry J 36 ' T" 6 character ^ng (com. 3. T) is written into address location ADDR4 and 
fn^? H? SS, K 9ne i the V3,Ue NEXT - CODE The "** F^E dictionary location is ADDR6 and accord- 
1''' 683 NEXT - CODE - Tl» Process continues in the same manner until all char- 

acters in compressed character stream 422 are decompressed 

mJ^H o!!r a ' ^ imp ' ementations - «> d es «• a^ed sequentially, with single-character strings being 

assignee I codes in the foHowmg order Cot C+1, C+2 C 0 + (2"M) where C 0 is some small constant (e.g 

C,-0). The new multiple character strings are assigned codes C 0 +2«\ C 0 + (2"+1) 2<>-2 2M inthatorier 

r? SUbSeqU ! nt d !r™ 0tar Stri " 9 h8S 8 SeqU6ntial addreSS value h the CAM. Hence, aCnmen" 5 
codes to stnngs « achieved simply by keeping a counter initialized to C**- and incrementing it every time 

™i™ t ™ Th ! d,Ct ;° nary reset - and subsequently increasing the length of the output code by one bit 

IVI! " Um K f ° f en , tne ! ^ diCti0nary reaCheS the next ^ of 2 - Theref °^ ^e length of the 
ou put codes vary between <m+1) and b, where 2» is the maximum size of the dictionary. This yields sorrl 

nir,^T e w SIOn ' * he com P reMor uses shorter codes when the dictionary address code 
t nl he dec f ° f mpressor bui ds rts di ctionar y in lock-step with the compressor, and can keep track of the 
expected length of the compression codes. 

first m« !"T i, T ra * ed ? F ' G - 16 ' the e " COded value for 8 new character stri "9 « the address of the 
irst FREE denary location. Immediately after a dictionary switch, the CD consists of character strings from 

ore^T Tl H diCti ° nary ^ l0Cati ° nS in the CAM 4,134 are n0t necessari, y contiguous. The^e strips 

turZZT addre «L a ? thereby th6ir COdeSl ^ the swi4ch - Therefore - ^ addresses (codes) S 

he rlnoe 0 < r^t ^ ^ 1° !T 3 C ° nti9U ° US SeflUenCe - A ' S °' every enC0ded character string C in 
the range 0 == C s 2"-1 is potentially available immediately after dictionary reset 

As a consequence, the output stream must use fixed length codes. However, the negative impact of this 

on compression ratio .s not significant Since the CD starts partially filled after a dictionary reset, even if the 

?mum hit /nSh 7 h ere p re0rdered ; the nUmber ° f bitS required 40 reDresent not be far from the max- 

mum bit .length (b). For example, in experiments, it was found that the current dictionary typically starts be- 
tween 1/4 and 1/2 full. This means that b-1 bits wou.d be required after the switch JLhSTSFJZ 
aligned m cont.guous order. It is possible, however, to use a variable length code during the building of the 
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first current/standby dictionary (CD.SD) or the current dictionary could be reordered after each reset 



5 



Compression Results 

Th* mission and decompression processes for the CAM multi-dictionary system were applied to ya- 
rioustvp^ 

IZ f ite The same Sles were compressed with a traditional LZW scheme using variable length output codes. 
An ovS... ^Sthe essions are shown in FIG. 21 . Line 440 is the graphical represen.at.on of t e conv 

cZiX^cor D ressor (e g one less bit = 1/2 the required memory space). This compress.on rat.o .s ach- 
TeTll^cZZ eyries that are only 1 or 2 bits .onger than a conventional LZW compressor data 

than oni would be changed to SD2 (a new name for the current INV value). At dicfonary sw.tch.ng time CD 
than once, wouio oeon g v gnd SD2 entrjes becQme SD entnes The first 

Irh^e which eliminates the need for dictionary initialization after power-up. Therefore; the CAM multKlio- 

svstem achieves *° mDression ratios comparab,e ^ tradrt,ona, *Z ZZ 

its 

plexity of the control circuitry. 

,V Selective Overwrite uihod of Data Compr^ on/Decompression in a CAM-Based Mu.tiple Dictionary 
System 

A second Lempel-Ziv Standby Dictionary (LZSD2) data » m P re ^"^^ 
^S^SSSli data entries and keeping each available storage location 
ing a.numberof previously stored data entries after each dictionary swap. 
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LZSD2 Compression 

crJIlT diC f ti ° n f ries f u re "f* for 12802 compression. The Current Dictionary (CD) holds new strings and 
stnngs demoted from the Standby Dictionary. The Standby Dictionary (SD) holds strings that have been 

data entr.es to the standby d.ct,onary. The FREE Previous Dictionary (FREE/PD) contains storage locations 
^t cu^rrently ass.gned date entries and data entries which have been demoted from the Current Dictionary. 
rD«,Dn? IF \ (FRE |f D > d i«ionary *™ selectably overwritten with new character strings. When the 
FREE/PD locations are filled up, the CAM changes state, in turn, creating in effect a dictionary swap 

The dictionary swap changes the priority in which data entries are overwritten with new input character 
stnngs^For example, data entries in the CD are demoted to FREE/PD and data entries in the SD are demoted 
? 6 Jr. J ' 3 dictionar y swa P- data entries previously assigned to the CD and which were not 

subject to being overwritten with new character strings, are now demoted to FREE/PD and are subject to being 
overwntten with encoded character strings. 9 
Referring to FIGS. 22A-22E. complete and continuous utilization of all dictionary space is carried out gen- 
erally by searching the three dictionaries SD. CD and FREE/PD at the same time. Initially, all CODE and CHAR 

It ««fo h aVaila £ le St0ra9e IOCati ° n in the CAM are reset to a known va,ue typically null and assigned to 
U T'^Pf i0 " ary (FREE/PD > as shown F'G- 22A Available storage locations refer to address 
locates in the CAM that are available for storing a data entry. A dictionary data entry comprises a string that 
.nc.uc.es PREVCODE N. which is the address of the best dictionary match that has been found so far and CH 
N which is the most recent character from the input data stream 

ppp™ J aracterstrin 9 (PREVCODE1.CH1) is stored in the first available address location (ADDRO) in the 

Cronr? ?2T ar ? aS l i9ned t0 thS Curr8nt Dictionar y < CD >- The next unique character string (PRE- 
VCODE2, CH2) is stored in the next available FREE/PD storage location (ADDR1) and also assigned to CD 
Character stnngs (PREVCODE3.CH3) and (PREVCODE4.CH4) are stored in the CAM at the next available 
addresses ADDR2 and ADDR3. respectively and both are assigned to CD. 

Referring to FIG. 22B, new unique character strings continue to be stored in available FREE/PD storage 
locafons and assigned to CD. If the compression process receives a new character string that has already 
been stored in the CAM as a data entry and not overwritten, the data entry is reassigned or promoted to the 
standby dictionary (SD). For example, data entry (PREVCODE1.CH1) has previously been stored at address 
location ADDRO. Therefore, if a new character string contains the (CODE.CH) values (PREVCODE1 CH1 ) the 
data entry at ADDRO is reassigned to SD. «=vv^L>ei,v,m;,tne 

vron^^f 10 ^ Previ0us| y 8tored data entries - (PREVCODE1.CH1). (PREVCODE3.CH3). and (PRE- 
VCODE^ CH4) assigned to the standby dictionary since each such data entry matched a new input character 
stnng. The CAM remains in the present state until each available CAM storage location (e.g.. the FREE/PD 
location at ADDR7) is filled with a data entry assigned to either the CD or SD as shown in FIG 22C FIG 22C 
dlcton a?y swap Vai ' ab ' e FREE/PD ,OCatl ° n * address ,ocation A 00 * 7 re P |aced **" a data entry prior to the 

swap. FIG. 22D shows the status of each data entry immediately after the dictionary swap. For example all 
data enfr.es previously assigned to the SD are reassigned to the CD (SD-»CD) and all data entries previously 
assigned to the CD are reassigned to FREE/PD (CD-FREE/PD). It is important to note that after the dictionary 
swap alldataentnes remain assigned to a dictionary. There will be no standby dictionary entries after the swap 
For example, the data entries previously assigned to the CD at address locations ADDR1 . and ADDR4-ADDR7 
are reassigned to FREE/PD. Therefore, all data entries remain available for character encoding after a CAM 
lost al Fte7a°CAM a re^ mPreSSi ° n perft>rmance is maintained since no previously encoded compression data 

... The *f S JL™!l 0d .^ SO " aS the to adapt for new input data bv selectively replacing data entries 

assigned to FREE/PD with new character strings not previously stored in the CAM. Specifically, if a new char- 
acter string matches a data entry in either FREE/PD or the current dictionary (CD), the new character string 
is reassigned to the standby dictionary (SD). For example, in FIG. 22E. the next input character string (PRE 
VCODE1.CH1) matches the CD data entry at address location ADDRO. Therefore, the data entry at ADDRO 
is reassigned to SD Further, the input character string (PREVCODE5.CH5) matches the data entry at address 

E WREEnS) KPRE^COTES CH5 d SD) entry * " fr ° m FREE/PD t0 SD (PREVCODE5.CH- 

C D C c/ a D n J nPU J Chara< ? er Stri " 9 dOSS " 0t matCh any existi "9 data entrv - a new character string is put into a 
f/rnnS? " aSSigned to CD " For example ' the input character string (PRE- 

VCODE9.CH9) does not match any data entry in the CAM. Thus. (PREVCODE9.CH9) is written into the CAM 
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storage location of the FREE/PD with the lowest address (i.e., ADDR1) and assigned to CD (PRE- 
VCODE9,CH9 ( CD). It is also possible to assign data entries based on criteria other than lowest FREE/PD ad- 
dress location* If the same string (PREVSCODE9.CH9) recurs before a subsequent reset and overwrite, this 
entry would then be promoted to SD. 

It can be seen that all data entries in the CD dictionary and those in the FREE/PD dictionary that have 
not yet been overwritten are still utilized at all times for character string matching. Since all data entries remain 
assigned to a dictionary after a CAM state change, no compression information is lost. Thus, the dictionaries 
can be continuously updated without temporary degradations in the data compression rate. 

There are several methods for selecting the data entry in the FREE/PD that is overwritten when a new 
input character string is identified in the compression process. As discussed above, a new character string 
that does not match any data entry in either the FREE/PD.CD or SD is overwritten into the storage location 
in FREE/PD with the lowest address. When the dictionary is initially built, by incrementing the address mono- 
tonically (as compared to using a hashing scheme), the lowest FREE/PD address will be the oldest dictionary 
entry. This situation will remain true however, only until all FREE/PD locations have been overwritten once. 
Alternatively, individual data entries in the FREE/PD dictionary can also be deterministically selected for re- 
placement with new input character strings. 

For example, data entries can be overwritten in FREE/PD according to how long the prior data entry has 
resided in a dictionary. In this example, each data entry can be assigned a tag that identifies the order in which 
it was written into a CAM storage location. The LZSD2 search process then selects the FREE/PD data entry 
with the tag value indicating it was least recently used (LRU). The least recently used data entry in FREE/PD 
would be the data entry that has resided in the CAM for the largest amount of time without matching an encoded 
character string. 

The LRU data entry in some situations may have the highest probability of not matching a new character 
string. Therefore, overwriting the LRU data entry has the potential of minimizing any loss of compression in- 
formation that could occur when an existing data entry is replaced. Utilizing tags to identify LRU data entries 
is described in detail by Bunton and Borriello in PRACTICAL DICTIONARY MANAGEMENT FOR HARDWARE 
DATA COMPRESSION, Communications of the ACM, January 1 992, Vol 35, No. 1 . 

FIG. 23 is a data flow diagram showing the general method for L2SD2 data compression. The compres- 
sion/decompression system shown in FIG. 4 is initialized for LZSD2 compression in block 450. Input characters 
(CH) f rom the input character string are read one at a time in block 452. If an End of File (EOF) condition is 
identified in decision block 456, decision block 454 then checks to see whether it is the first time through the 
compression cycle. If the EOF condition is encountered the f irst time through the compression cycle, decision 
block 454 ends the LZSD2 compression process. If it is not the first time through the compression cycle when 
containing the EOF condition is detected, block 460 outputs the previously matched sequence PREVCODE 
and block 464 provides additional cleanup for ensuring proper formatting of encoded output characters. 

Referring back to decision block 456, if an EOF condition is not identified, block 458 searches all three 
dictionaries (i.e., FREE/PD, CD, and SD) for the extended string (PREVCODE.CH). If the (PREVCODE,CH) 
string is matched with a previously stored data entry, decision block 462 jumps to block 472. If (PRE- 
VCODE.CH) is not already in the SD, block 472 reassigns the CAM location with the matching (PRE- 
VCODe!cH) data entry to the Standby Dictionary. The matching data entry is reassigned into the SD by chang- 
ing the status bits. The (PREVCODE.CH) string is then encoded using the memory address of the matched 
data entry and assigned to PREVCODE, that is, CODE(PREVCODE,CH)->PREVCODE. Block 472 then jumps 
to block 452 where the next input character (CH) is combined with the encoded value of (PREVCODE). In this 
way, the status bits for the stored codes for each substring within a matched string are updated so they will 
be retained in the subsequent reset. 

Once a string has been extended to the point that a match does not occur in decision block 462, block 466 
outputs PREVCODE as the best match found. If there is an available FREE/PD location, block 468 updates 
the dictionary by writing (PREVCODE.CH) into the next available address (e.g., FREE/PD dictionary with low- 
est address) and assigns it to the Current Dictionary (CD). If there are no available FREE/PD locations, block 
468 updates the dictionaries by swapping the current dictionary into the FREE/PD dictionary and swapping 
the standby dictionary into the current dictionary by changing the status bits, that is, (CD^FREE/PD, 
SD-»CD). Block 470 prepares for the next input character string by assigning CH to PREVCODE (CH PRE- 
VCODE). There are alternate mappings from single character strings to compressed codes. The compression 
process then returns to block 452 to read and combine the next input character CH with PREVCODE. 

FIG. 24 is a detailed data flow diagram for the LZSD2 compression scheme shown in FIG. 23. The following 
variables are used to describe LZSD2 compression and decompression. 

CAM Content Addressable Memory. Each dictionary entry contains ([MAXBITS bit code 

f ield],[8 bit character field], [2 bit status field]). 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



CD 
CH 

CODE 



SIZE 



DEPTH 



EOF 



Two bit Status value which indicates that the dictionary entry is in the Current Dictionary. 
Eight bit variable which contains the most recent input character. 
The number of bits currently used for each output code. The minimum is 9 and the max- 
imum is MAXBITS which is determined by the dictionary size. 2< MAXBITS > >= (Number of 
dictionary entries) + (Number of root codes) (typically 256) + (Number of control codes). 
Variable which contains the number of characters in the string represented by PRE- 
VCODE during compression or INCODE during decompression. The size of this variable 
is determined by MAXDEPTH. 

Aflag which indicates, when set, that an attempt to read data from the input stream failed 
because the end of the data stream was reached. 

Eight bit variable which contains the first character of the string represented by INCODE. 
When the dictionary is searched for a data entry that matches the input data string, and 
a match is found, this MAXBITS bit variable is assigned the address at which the match 
was found. 

Two bit Status value which indicates that the data entry is in the Previous Dictionary. It 
also indicates that the location can be overwritten. 

ACODE_SIZE bit control code which signals the decompressor to start reading one more 
bit for each compressed code. 

MAXBITS bit variable whose value is read from the compressed data stream. 
This is any MAXBITS bit code that is not a dictionary entry, i.e., INVALID may represent 
a control code or a root code. 

MAXBITS bit variable which contains the address of the most recently built code. 
This indicator is true if a search of the dictionary succeeded in finding a match. 
Maximum number of bits in an output code. 
The maximum string length that a code is allowed to represent. 

MAXBITS bit variable which contains the address of the dictionary entry that is to be over- 
written with the new character string. 

MAXBITS bit variable which contains the address of the best dictionary match that has 
been found so far during compression. During decompression PREVCODE represents 
what INCODE was during the previous cycle. 
The string length of PREVCODE during decompression only. 

Two bit Status value which indicates that the dictionary entry is in the Standby Dictionary. 
An eight bit by MAXDEPTH LIFO queue used for string reversal. 
Indicator flag that is true when a dictionary swap is needed. 

A MAXBITS bit variable used as a temporary storage location when decoding INCODE. 
Temporary variable used to keep track of the STACK depth while it is being emptied. 
Referring to FIG. 24A, at start up, block 474 puts the compression/decompression circuit previously shown 
in FIG. 4 in a known, consistent state. Each dictionary entry is set to a predetermined value. For example, the 
codef ield, characterf ieid, and status fields are typically set at; 000 Hexadecimal (HEX), 00 Hex, and FREE/PD, 
respectively. Thus, every dictionary location in the CAM contains the two character string (NULL NULL 
FREE/PD). It is also possible to initialize each dictionary entry to different string values to further increase 
the compression ratio. For example, character string combinations that occur frequently in the input character 
stream can be written into the FREE/PD dictionary prior to beginning the compression process. 

Output format control is carried out by the compressed data interfaces 1 38 and 1 48 (FIG. 4) and are reset 
by block 474 to an empty initial state. The CODE_SIZE variable/register is typically set to minimum value such 
as 9. the LAST_CODE_BUILT register is set to INVALID, the DEPTH register is set to 0, and SWAP_FLAG is 
unset. 

Block 476 reads an eight bit character from the input character stream and assigns it to variable CH. De- 
cision block 478 determines whether a data read failure occurs due to reaching the end of the input stream 
(i.e., EOF flag). If an EOF condition is detected, decision block 486 ends the compression process and outputs 
any remaining encoded information. If the data read process succeeds (i.e., no EOF condition), decision block 
478 continues the LZSD2 compression process. 

When a data read failure occurs, decision block 478 jumps to decision block 486 where the string length 
of PREVCODE is checked. If DEPTH = 0, PREVCODE has a 0 string length and cannot be output. Since the 
EOF flag also indicated that the end of the input data stream has been reached, DEPTH=0 indicates that there 
is nothing left to output; compression is finished and block 486 ends the compression process. DEPTH = 0 
only after initialization which means that no data was input. If DEPTH > 0, decision block 486 jumps to decision 
block 492. 
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If the data read was successful, decision block 484 checks the value of the DEPTH variable/register. If 
DEPTH = 0 PREVCODE has a 0 string length and cannot be used in dictionary searches. Therefore, block 
480 assigns the input character CH to PREVCODE (PREVCODE = CH). PREVCODE now represents a one 
character string so DEPTH is set to 1 . Block 480 then jumps back to block 476 to read the next mput character 
CH 

If DEPTH > 0 PREVCODE has a valid character string and the compression process continues to block 
488 Block 488 searches all three dictionaries (FREE/PD, CD, and SD) atthe same time for a data entry having 
PREVCODE in the code field and CH in the character field (PREVCODE.CH). The simultaneous search of all 
three dictionaries is performed by simply searching only the code and character fields and disregard.ng the 

Tto polsibieformultiple addresses to match the (PREVCODE.CH) string, therefore, multiple matches must 
be reduced to one. This is done by using a priority encoder which will select the match address with the lowest 
value. When a location is found that matches (PREVCODE.CH) block 488 sets the MATCH flag, assigns the 
match address to FOUNDCODE. and goes to decision block 490. . ,. „. cril IKinrnnF 

Locating a (PREVCODE.CH) match in the dictionary is not enough to determine whether FOUNDCODfc 
is an acceptable output code. Two additional tests must also be passed in decision block 490. First, the de- 
compressor will incorrectly decode FOUNDCODE if such code was just built by the compressor. Therefore 
FOUNDCODE cannot be equal to LASTCODE.BUILT which prevents the most recently built d.ctionary entry 

from being used as an output code. ^.uAYnewH facade 

Also in certain rare cases, it is possible that a string in the d.ct.onary is longer than MAXDEPTH. If a code 
longer than MAXDEPTH were output, the decompressor string reversal register (see FIG. 4) woulc (overt now 
and cause an error. To prevent this, the string length of FOUNDCODE is also checked in dec.sion b.ock 490. 

If the (PREVCODE.CH) string is matched in the dictionary. LAST_CODE_BUILT n not equal to FOUND- 
CODE and DEPTH < MAXDEPTH, decision block 490 jumps to block 482 where the dictionary data entry that 
matches the (PREVCODE.CH) character string is reassigned to the Standby Dictionary (SD). The data entry 
is reassigned by setting the status field at address location FOUNDCODE to SD. Block 482 j^®" ^" 
VCCOE equal to the best string match found so far, namely FOUNDCODE (PREVCODE=FOUNDCODE), and 
the DEPTH variable is incremented to the new string length of PREVCODE, i.e., DEPTH is incremented by 1). 
Block 482 then jumps to block 476 where the next input character (CH) is read and appended to the new PRE- 
VCODE value creating the new string (PREVCODE.CH). 

Referring back to decision block 490, if the (PREVCODE.CH) string does not match a d,<* j»n«ydatB entry 
or if either of the two other tests performed in decision block 490 fail, block 496 ^^^S^l T 
(PREVCODE.CH) string is assigned to the current dictionary in block 514 (i.e., PREVCODE.CH.CD) as ae- 

SOTb Beforebe'ing output, the number of bits in PREVCODE is checked in decision block 492 (see FIG. 24B)^ 
If PREVCODE is greater or equal to 29°°*-*** it cannot be represented by the present number of CODE.SIZfc 
bits (e g 9) In this case, block 494 increases CODE_SIZE by one so that all future output codes are repre- 
sented by an additional bit Block 494 also outputs a GROW control code using CODE SIZE bits wh.ch must 
be packed into bytes by the formatter circuit (FIG. 4) before it can be output. The GROW code ,s a signal to 
the decompressor (see FIG. 26A) that all future codes will be one bit longer than the current code see. It is 
possible for PREVCODE to require more than one more bit in order to be output Therefore, block^ 494 jumps 
back to decision block 492 and checks if another GROW must be sent before actually outputting PREVCODE. 
Block 496 then outputs PREVCODE using CODE.SIZE number of bits. The CODE.SIZE number .s used by 
the formatter to pack PREVCODE into bytes before being output. 

Decision block 498 is a continuation of the EOF check previously performed in decismn block 478. A de- 
tected EOF condition in decision block 478 may come back into the main compression flow at block 492 in 
order to output the last best match code (PREVCODE) (see decision block 486). In add ition, the last code output 
may not completely fill the last output byte. Statistically, only 1 out of 8 output codes will do so. Therefore, block 
500 pads the leftover bits with 0's or 1 's, if needed, before outputting the final byte. At this po.nt, the compres- 

^Tn'oToV VagTdls cted. decision block 502 checks whether the SWAP.FLAG is set If the SWAP FLAG 
is set the dictionaries are swapped by replacing FREE/PD with CD, and replacing CD with SD (SD-»CD, CD-*- 
FREE/PD) Swapping the dictionaries does not actually change any data in the CAM but changes how the sta- 
tus field to interpreted by the compression engine (see FIG. 17). After a dictionary swap, the status register 
code that previously represented SD now represents CD, the status register code representing CD now rep- 
resents FREE/PD FREE/PD becomes INV, and INV becomes SD. INV remains empty because FREE/PD is 
always empty before the swap, thereby keeping INV empty after the swap. Also, since INV was empty before 
the swap (INV is always empty), the Standby Dictionary (SD) is also empty after the swap. 
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If decision block 502 determines that the SWAP_FLAG is not set decision block 506 (FIG. 24C) checks 
to see if the number of characters in PREVCODE (DEPTH) is less than the maximum allowable string length 
(MAXDEPTH). If DEPTH = MAXDEPTH, the character string (PREVCODE.CH) is too long to be used as an 
output code and, therefore, will not be added to the current dictionary. If DEPTH < MAXDEPTH and there is 

s a location available, the character string (PREVCODE.CH) is added to the current dictionary. 

First block 508 searches the CAM for an available FREE/PD location by checking the CAM status fields 
while ignoring the code (PREVCODE) and character fields (CH). Thus, a match can be successful regardless 
of code or character field values. Since it is possible that there is more than one FREE/PD data entry, a priority 
encoder selects the match address with the lowest value. 

10 As mentioned above, it is also possible to select between multiple matches according to which FREE/PD 

dictionary entry was least recently used. For example, a tag could be associated with each dictionary entry 
indicating the order in which the entries were stored in the CAM. The priority encoder would then select the 
least recently used data entry from among the multiple matches. The least recently used data entry is most 
probably the character string that is least likely to match a new encoded character string. Thus, replacing the 

is least recently used data entry minimizes the effect in losing a small amount of compression information. Al- 
ternate priority selection methods are also capable of being implemented. 

If a FREE/PD status field is located, decision block 510 jumps to block 514 where (PREVCODE.CH) is 
added to the CAM at the matched address location. Block 514 writes the (PREVCODE.CH) string into the CAM 
at address NEXTCODE and assigns the string to the current dictionary. The string (PREVCODE.CH) stays in 

20 CD until a match occurs with a new input character string whereby (PREVCODE.CH) is then promoted to SD. 
Otherwise, (PREVCODE,CH) stays in CD until a dictionary swap is performed, then it is reassigned or demoted 
to FREE/PD. 

If the search for FREE/PD fails, block 512 sets the SWAP_FLAG indicating that a dictionary swap is need- 
ed. Failure to find a FREE/PD status field also means that the string represented by (PREVCODE.CH) will 

25 not be entered into the dictionary. The dictionary swap is delayed until the next compression cycle (see deci- 
sion block 502) in order to maintain synchronization with the decompression dictionary. 

For example, the LZSD2 decompressor (see FIGS. 25 and 26 below) performs the status field update in 
a different order than the compressor and initiates the dictionary swap immediately after failing to find a 
FREE/PD status field. The L2SD2 decompressor updates the status field bits for a given codeword and then 

30 attempts to write the previous codeword into the dictionary. Therefore, delaying the dictionary swap in the 
L2SD2 compression process until after the next code's status fields have been updated, allows the compressor 
and decompressor dictionaries to be identical when a dictionary swap occurs. 

If no codeword was built during the decompression cycle, block 516 sets LAST_CODE_B Ul LT to an invalid 
value. The most recently built codeword can then be used in future matches. Therefore, if a (PREVCODE.CH) 

35 character sting was not built because the maximum string length was exceeded (DEPTH = MAXDEPTH) or 
because the dictionary was full, the most recently built dictionary entry does not point to the address of the 
last (PREVCODE.CH) match (i.e., FOUNDCODE). Thus, block 516 sets LAST_CODE_BUILT to an INVALID 
address which cannot be matched with FOUNDCODE in the next search operation. Block 518 replaces PRE- 
VCODE with CH (PREVCODE = CH). Since PREVCODE now represents a one character string. DEPTH is 

40 set to 1. 

Block 518 then jumps back to the block 476 to read the next character from the input data stream. The 
LZSD2 compression continues until all characters in the input character stream are encoded. 

LZSD2 Decompression 

45 

In the present embodiment the same three dictionaries used for LZSD2 compression (CD, SD, FREE/PD) 
are also used for implementing the LZSD2 decompression scheme. When the compressor runs out of locations 
in which to store new strings (those which have a FREE/PD status register assignment), the decompressor 
swaps dictionaries in a manner similar to that discussed above for LZSD2 compression. For example, CD be- 
so comes FREE/PD, SD becomes CD, and SD becomes empty after a dictionary swap. For data decompression, 
the processor interface 1 52 (FIG. 4) controls the flow of compressed data from the compressed data interface 
148 through the compression/decompression engine 142 and out the string reversal queue in uncompressed 
data interface 138. 

FIG. 25 is a general flow diagram showing L2SD2 decompression. Block 520 initializes the compres- 
55 sion/decompression system shown in FIG. 4 for LZSD2 decompression. Block 522 then reads encoded input 
strings (INCODE) from a compressed input data stream and stores the encoded characters into a temporary 
variable/register. The input data stream represents the input character string previously encoded by the LZSD2 
compression scheme described above in FIGS. 23 and 24A-C. Decision block 524 checks for a EOF condition 



26 



EP 0 666 651 A2 



Mctfn, .he and of .he encodrf characb* wtog. Decision bio* 526 de.em.ine. whefte, the input charac*, 

° rfop^ MDE^Ri^S'RVSnnTte sMred in .he L. mailable FREE/PO address and assigned to .he 

522 i repea. another *-t~j-- ^.tffiSSS? > tC£ 

26^0 is e de«ed fkmchar, further descnbino .he UZSD2 decompression *™ 
2, impression s«up,.h.CAM JSSTlC^SISSS 
„ il creaMd .he compressed da<a. Therefore, bio* ^ ™.^™d. chara»« and aatus field is lypically 
vafces orfcina.ly set « '""a*" 1 

re^a17u™S-^ 

" 3 a^REVT^THIs^ ,„„, 
Block 536 reads single byKs from the ^^^^'^^uodtSXn*!* 

ends decompression. If the input code read is success^, aecompi identify the specific 

INCODE k ."reserved code. decision ^J^TSSS SSEifi ^ ££31.. INCOOE 

CODE.SIZE is already a. fte — a G ^ ^ ^^ m ' mcriesi2e , 1)loek 548lnore^ 
to block 536 to read the next input code 0NCODE). the 

^^CODE,^ 
256^,^^ 

tcode j^Tn^E^^ *— btod < 556 ^ 



27 



EP 0 666 651 A2 



where single characters CH are gleaned from TCODE and TCODE reassigned from the code field at address 

Decision block 552 jumps to block 553 when TCODE is less than 256. TCODE then represents a sinole 

to^T^rf\ Mth( t U9h 004 8 re " uire ™"* al > character strings 2 TrnTppTd tolhe » 
code as the ASCII code for that character. This allows TCODE to be placed directly onto the top of the S1ACK 

be ZXtoZZ* TCOObVT 0 ^ With ° Ut d ° in9 3 CAM ,0 ° k UP " The A* *«r in^he stack 2 
beforfFRSTC^ 

r??n C u ~ ' therefore - a variable/register FIRST.CHAR could be eliminated by simply usino 

n C D^u 0WeVer • F,RST - CHAR fe to make FIG. 26A-26C easier to understand. ™ * 

qta™ 18 ?° W eqU !' t0 the number * characters on the STACK since the final character placed on the 

Sack ZZSEfSSS* 5 ? 3 but back in block 544 DEPTH is used and cha "9 ed w^«SS he 

to blo^Si" wh°I'toL C, i f TD ,T TH iS 9reater th3n °' the STACK iS " 0t em #y and decision Wock 560 jumps 
are popped off STACK and output until TDEPTH = 0. «•«"» 
n» JUSl?! f A ? K fe e T y * thS decom P ressor is ready to read a new encoded character from the com- 
t'Tvle^ 

tne value of PREVDEPTH. If PREVDEPTH is between 0 and MAX DEPTH, the combined string PREVCODE 
(prevtously read encoded character) and FIRST CHAR are stored in the next available FREE/pd7^I J JnH 
SSlI- * T V ? h ° DE : F,RST - CHAR ' C5 )- " ^DEPTH - Xn£££^T££ 
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EZZ^SZ \ S 7 9 ** ^ diCti ° nary St3tUS f ie ' dS ,0r 3 FREE/PD va,ue - The ^ a "d Character 
s a n^ t m T" 8 ^ 030 be successful regardless of what the Code or Character fields con- 

te,n S,nce ,t ,s possible for more than one address to meet the conditions of the FREE/PD search muWpTe 
matches must be reduced to one. Thus.block 566 uses the priority encoder used for compress^ £Z5 
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SLJS^ 22" * he Seaf0h f0r 8 FREE/p D location failed, block 568 swaps the dictionaries ie 

b ; SpSSToE?"^ ,nv> - Fai,ure to f ind a FREE/PD ,ocatton means t£5E£: 

sented by (PREVCODE, FIRST_CHAR) will not be entered into the CAM 

VCO^EFlS^CHiRrr' 0 "! inP . U * S ' rin9 ' PREVCODE. by the character FIRSTCHAR. The (PRE- 
VCODE. FIRST CHAR) stnng stays .n CD until the same (PREVCODE.FIRST CHAR) string occurs aqain in 
^SS^^^ ( PREVCODE . F '^_CHAR) data entry is tlTen promoted to 
to t-Ktt/PD when the next dictionary swap is performed 

thJ^r*J?l PRBVC ? DE ! qUa ' to ^ Va ' Ue ° f ,NCODE " PREVC °DE ™y be used in the next pass 
Sri h T P ° 8 " eW ( p REVCODE.FIRST_CHAR) character string for placing in the die 

.onary by using the ifjrst character of the next input code as an extension character. PREVDEPTH ?s set equal 
to the value of DEPTH in order to keep track of the string length of PREVCODE to prevent a greate *an m« 
imum length string from being added to the CAM. Then the process returns to block 536 (RG 26aT 
com Il«l S bae " ShOWn how 12802 increases the ova rall compression ratio of a lossless compression/de- 
compression system by maintaining all data entries in dictionaries after a dictionary swap. Thus all date enttes 
remaincapableof matching new input characterstrings maintaining currentcompressiondato.Tls ™e? 
sion will no drop off immediately following a dictionary swap. The LZSD2 a.so has the capabili^^a 
coZZf ate j ,ySe,e *" e,y °^^ 

compression performance is optimized for current trends in the input data stream 

be aooIrenmJT^ and " ,ustrated the P rinci P |e s *e invention in a preferred embodiment thereof, it should 

%?E^JZ2F? ^ ^ m ° dif ^ in arran 9 ement and detail departing from such princi- 

ples. We claim all modrf.cat.ons and variation coming within the spirit and scope of the following claims 
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Claims 

1 Amethodforcompressinganddecompressingdataconsistingofcharacterstringsusingamemorybas^ 
diCti ° n ^ 

unique address for storing a codeword (PREVCODE) for a character string (1 82) as a data .entry 

defining at least first and second dictionaries (450) within the plurality of storage locafons of the 

mem0 storinn'codeword that uniquely corresponds to a character string as each data entry (468); 
assiqning each stored data entry to at least one of the first and second dictionanes (468) 
generating " ^codeword value representing an input data character string (482), the Reword ^ 
associated in memory with a previously stored codeword that corresponds to a port.on of the .nput char- 
acter strina and is assigned to one of said dictionaries; and 

eS overwriting one of the data entries currently assigned to one of »• 
new codeword associated with a new character string thereby using all data entnes ,n the f .rst and second 
ZZZHZ tor generating codewords for character strings at ai. times during data com P ress,on and de- 
compression (514). 

2 A method according to claim 1 wherein at least one of said dictionaries is assigned an overwrite priority 
anTdataen^^ 

in said one dictionary. (508) 

3 A method according to claim 1 wherein the memory device has multiple states and the > assign- 
ment for each data entry is determined according to a current state of the memory dev.ce (328) 

4 Amethod according to claim 3 wherein each dictionary is assigned an overwrite priority and changing 
fhe^te of *e memory device changes the overwrite priority of at least one of the d.ct.onanes so that 

5. Amethod according to claim 1 wherein storing codewords for character strings in the memory device com- 
PriSe titrate location in the first dictionary that is available to be overwritten with a new co- 

dew niLVt^ 

(482); /iao , . 

reassianinq the new data entry to the second dictionary (482); and 

ail data entnes in the second dictionary to the first dictionary after all available storage 
locations in the first dictionary have been overwritten (504). 

fi A method according to claim 5 including: 

providing a content addressaWe memory (312). having a pluralrty of storage <<**t,ons; 

defining first, second, and third dictionaries within the plurality of storage locat.ons of the content 

-d, ^^£^ as data entries in said storage locations, each codeword corresponding 

genSng a codeword value representing a date character string, the codeword value abated 
in memory with a previously stored codeword that corresponds to the character stnng and . ass.gned to 

word not Dresently stored as a data entry in any of said dictionanes (508); and 

slleSvelv overwriting the prioritized data entries currently assigned to said one d.ct.onary wrth 
new ccd SZding to n'ew character strings not currently stored in the memory «J£ 
au^ame timeusing all data entries of each dictionary for generating codeword values atall fmes dunng 
i the compression and decompression process (488,508,514). 

7 A method according to claim 6 wherein the content addressable memory has multiple """^J!?- 
tiolry assignment for each data entry depends upon the state of the content addressable memory (328). 
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A method according to claim 6 wherein each data entry is promoted to a dictionary with a higher priority 
thereby making the data entry less likely to be replaced with a new character string according to the num- 
ber of times said data entry has been previously used for generating codeword values (504). 

A method according to claim 6 wherein data entries are only available for selective replacement from a 
single dictionary (508). 



30 



EP 0 666 651 A2 



INPUT DATA 



FIG. 1 



8 



INITIALIZE 
CO AND SO 



COMPARE DATA 
STRING WITH 

CURRENT 
DICTIONARY 



10 



ADD UNMATCHED 
ENCODED STRINGS 
TO CURRENT 
DICTIONARY 



12 



OUTPUT CODE 
FOR LONGEST 
MATCHED STRING 

~r~ 

ADD SELECT 
ENCODED STRINGS 
TO STANDBY 
DICTIONARY 




TRANSFER SD 
ENTRIES TO CD; 
INITIALIZE 
ANEW SD 



18 



31 



EP 0 666 651 A2 



YES 



INPUT DATA 



1 



FIG. 2 



COMPARE DATA 
STRING WITH 
CD ENTRIES 



20 




26 



SET SD FLAG 
IN CD 



z 



SWITCH CD <« SD 



INITIALIZE 
NEW SD 

1 



WRITE STRING 
INTO SD 



42 



ADD STRING 
TO CD 



INCREMENT SD 
ADDRESS 
COUNTER 



44 



INCREMENT CD 
ADDRESS 
COUNTER 



ADD NEXT INPUT 
CHARACTER TO 
STRING 



30 



READ NEXT INPUT 
CHARACTER 



34 



32 



EP 0 666 651 A2 




33 



EP 0 666 651 A2 



CO 



\ 



CM 




< 
Q 

O *o 
O ^ 



CO 



6 



a 

tij VJ 

co g 

CO < < 

o 5 
o 



o 



O 
u. 



CD 
CO 



5 



CD 



> 




cc 




O 








i 




LU 




-J 




CO 












o 












CO 







o 



cc 
o 
to 

CO 
UJ 

o 
o 
cc 

£L 
O 
CC 
O 



CM 

in 



o 




34 



EP 0 666 651 A2 




35 



EP 0 666 651 A2 



208 



SEARCH 
ADORN 

NOMATCH 



ADDRN 
WRITE 



(QUALIFIED) 




214 



FIG. 6 



36 



EP 0 666 651 A2 




37 



EP 0 666 651 A2 



248 



250 



252 



256 



258 



260 



266 



268 



270 



272 



274 



START 



INITIALIZE STRING 
TABLE MEMORY 



I 



READ 1ST INPUT 
CODE (CHARACTER) -> 
OMEGA 



READ NEXT INPUT 
CHARACTER^ K 



SEARCH STRING TABLE 
FOR OMEGA-K MATCH 




AUTO-UPDATE 
STRING TABLE 



I 



INCREMENT ADDRESS 
COUNTER 



OMEGA -> OUTPUT 



I 



CODE (K) -> OMEGA 
READ NEXT INPUT 
CHARACTER -> K 



FIG. 8 



IF DURING ANY READ NO 
MORE INPUT DATA IS 
AVAILABLE, OUTPUT OMEGA 
AND EXIT. 



MATCH AD6 ->0ME£a^ 

READ NEXT INPUT 
CHARACTER -> K 



264 



38 



EP 0 666 651 A2 



START 



I 



INITIALIZE 
STRING 
TABLE 

MEMORY 



276 



GET INPUT 
CODEWORD -> CODE -> 
OLDWORD 



GENERATE K = BYTE (CODE) 
OUTPUT K 



GET INPUT 
CODEWORD •> CODE ■> INCODE 



I 



READ ENTRY AT 
ADDRESS = INCODE 



OUTPUT K 



FIG. 9 



278 



280 



282 



284 



286 




290 



CODEWORD FIELD -> ADDRESS 



OLDCODE-K -> STRING TABLE 
(NEXTADDRESS) 



I 



292 



NEXTADDRESS 
NEXTADDRESS 



I 



294 



INCODE -> OLDCODE 



296 



39 



EP 0 666 651 A2 



RAW DATA STREAM 



300 



R 


I 


N 


T 


I 


N 


T 


I 


N 











302A 



306 



ADDRO 


R 


ADDR 1 


I 


ADDR2 


N 


ADDR 3 


T 


ADDR 4 




ADDR 5 




ADDR 6 




ADDR 7 




ADDR 8 




ADDR 9 









302B 

R 
I 

N 
T 

0 I 

1 N 

2 T 

3 I 
5 T 
7 N 



COMPRESSED CHARACTER STREAM 



304 



0 


1 


2 


3 


5 


7 


2 




...( 


308 


DECOMPRESSED T \ 
CHARACTER STREAM , ^ v ^r- A -^ 


R 


I 


N 


T 


I 


N 


T 


I 


N 







302C 



302D 



ADDR 0 


R 


ADDR 1 


I 


ADDR 2 


N 


ADDR 3 


T 


ADDR 4 




ADDR 5 




ADDR 6 




ADDR 7 




ADDR 8 





R 


I 


N 


T 


0 


I 


1 


N 


2 


T 


3 


I 


5 


T 



FIG. 10 



40 



EP 0 666 651 A2 



316 



31 B 



320 



•n 



n 



314 

_1 



ADDRESS_IN DATA-IN DATA_MASK 

CONTROL CAM 
MATCH_ ADDRESS 



312 



MATCH 
SUCCESS 



DATA-OUT 



326 

FIG. 11 



C 



322 



n 



-324 



ST 


CODE 


char] 


2 


b 


► 


m 

h — - — H 



FIG. 12 



ST 
FIELD 


S=0 


ST/> 
S=1 


JE 
S=2 


S=3 


FREE 


00 


10 


11 


01 


CD 


10 


11 


01 


00 


SD 


11 


01 


00 


10 


INV 


01 


00 


10 


11 



FIG. 13 



41 



EP 0 666 651 A2 '* 



FULL RESET 
(AT POWER UP) 




CD FULL 



CD FULL 





0 0 


1 0 


1 1 


0 i| INITIAL STATE 


i i i i 


| 1 1 1 -I ,_^-328 

I ff T • I 










LJ SHIFT 2 BITS 



FREE CD SD INV FIG. 15 



328 



CD 



328 



SD 

r 




CAM DATA_IN 
(ST FIELD) 



CAM MASK 
(ST FIELD) 



338 



FIG. 17 



42 



EP 0 666 651 A2 



CO 




in 
in 

Ul 

o 
o 

Z) 

(/> 
I 

X 

o 
I- 
< 



I— 

o 



< 



CM 

to 

L 



m 



< 



Ul 




o 


o 


o 


o 


o 


o 


I 


I 


\— 


UJ 


X 


> 


UJ 


< 


z 





2 » 



V) 

i/> 

UJ 

cc 

Q 
Q 
< 
I 

X 

o 



43 



EP 0 666 651 A2 



INITIALIZE 

s=o 

ST«FREE(0) ki 
NEXT_C0DE=2 M 



376 



READ 1ST INPUT CHAR 
code (K ) — OMEGA 



K 



READ NEXT INPUT 
CHAR— +*K 



5 78 FIG. 18 



360 



SEARCH CAM FOR 
MATCH (ST. OMEGA. K) 
ST=CCD(S» OR SD(S)3 



382 



384 




YES 



r 



386 



MATCH ADD— IOMEGA 
SD (S) — ST (MATCH ADD) 



OMEGA— ^OUTPUT 
CCD(S). OMEGA. K3 — *-(NEXT_CODE> 
code (K)— IOMEGA 



I 



388 



SEARCH FOR ST=FREE(S) 
CST=FREE(S)3 



390 



392 




YES 



396 





^-394 


MATCH_ADD — *-NEXT_C0DE 



SWITCH 
DICTIONARIES 
S-M M0D4— •►S 



44 



EP 0 666 651 A2 



INITIALIZE S-0 
ST«FREE(0) u 
NEXT_CODE=2 M +1 
SAVE_CODE=0 



398 



FIG. 19 



READ NEXT INPUT 
INPUT CODE— IOMEGA 



+00 



DECOMPRESS 
OMEGA— 



I 



_ 401 

Y 



FIRST CHAR OF W 



403 




402 



C 



404 



BUILD DICTIONARY 
CD(S). SAVE_C0DE. 
C — *-<NEXT_C0DE> 



SD(S) 
OMEGA 



ST (OMEGA) 
SAVE_C0DE 



405 



I 



FIND NEXT FREE LOCATION I * 06 
WITH ST-FREE(S) J-^ 



408 



412 




410 



MATCH_ADD 



MATCH^X^- 



»NEXT_CODE 

A 



SWITCH 
DICTIONARIES 
S=(S+1)M0D4 



FIND NEXT FREE LOCATION 
WITH ST=FREE(S) 
SAVE_CODE=0 



413 



45 



EP 0 666 651 A? 



414 



RAW DATA STREAM 



I 


N 


T 


I 


N | T 


I 


N 


R 



• • • 



ST CODE CHAR 



424 426 428 
ST CODE CHAR ST CODE CHAR 



ADDRO 

ADDR1 

A DDR 2 

ADDR3 

ADDR4 

ADDR5 

ADDR6 

ADDR7 

ADDR8 

416 

422 



T 



■\ H 



o o 



O 0 



0 0 



0 0 



N 





— r 




I « 


! ! i 


! ! N 


i i T 


i 


o! 


0 


I I 


1 


1 1 


1 


2 N 


1 


0) 


2 


| T 


1 


Oj 


3 


■ I 


1 


o! 


5 


i T 




i 
« 




! R 


! ! I 


! j N 


i l T 


1 11 


3 


i I 


0 1 j 


1 


S N 


1 1 ] 


5 


{ R 


1 Oj 


3 


] I 


1 o! 


5 


I T 





418 

COMPRESSED 



420 



S=1 



CHARACTER 


STR 


EAM 


0 


1 


2 


3 


I 5 


3|S|0| S 



V 



438 



430 




I N 



N T I N 



ST 



DECOMPRESSED 
CHARACTER STREAM 

CODE CHAR ST CODE CHAR 



ADDRO 

ADDR1 

ADDR2 

ADDR3 

ADDR4 

ADDR5 

ADDR6 

ADDR7 

ADDR8 



T 



0 0 



0 0 



0 0 



O 0 



0 0 



N 



1 


4 




R 


1 




h 


I 


1 


1 j 




N 


1 


1 1 




T 


1 


o! 


0 i 


I 


1 


1 1 


1 i 


N 


1 


Oj 


2 ] 


T 


1 


0] 


3 i 


I 


1 


o! 


5 ! 


I 



ST 


CODE CHAR 


01 


i i 


R 


1 1 


1 h 


I 


1 1 




N 


0 1 




T 


1 1 


! 3 ! 


I 


0 1 


S 1 ! 


N 


1 1 


! 5 ! 


I 


1 0 


I 3 I 


I 


1 0 


l 5 i 


T 





432-^ 5=0 434-^ S "° 



434-^ " ~ 436 

FIG. 20 



S=1 



46 



EP 0 666 651 A2 




47 



EP 0 666 651 A2 





CODE 


CHAR 


ST 


ADDRO 


PREVCODE 1 


CHI 


1 CD 


ADDR 1 


PREVCODE 2 


CH2 


i CD 


ADDR2 


PREVCODE 3 


CH3 


• CD 


ADDR 3 


PREVCODE 4 , 


CH4 


', CD 


ADDR 4 


NULL ' 


NULL 


' FREE/ PD 


ADDR 5 


NULL I 


NULL 


! free; PD 


ADDR 6 


NULL i 


NULL 


I FREE/ PD 


ADDR 7 


NULL 1 


NULL 


1 FREE/ PD 



■ PREVCODE1, CH1 

■ PREVCODE2, CH2 
PREVCODE3, CH3 
PREVCODE4, CH4 



FIG. 22(A) 



CODE 



CHAR 



ST 



PREVCODE 1 


CHI 


I SD 


PREVCODE 2 i 


CH2 


, CD 


PREVCODE 3 « 


CH3 


• CD 


PREVCODE 4 ! 


CH4 


i CD 


NULL i 


NULL 


• FREE/ PD 


NULL | 


NULL 


! FREE/PD 


NULL i 


NULL 


i FREE/PD 


NULL 1 


NULL 


• FREE/ PD 



• PREVCODE 1, CH1 

PREVCODE3. CH3 
PREVCODE4, CH4 



FIG. 22(B) 



48 



EP 0 666 651 A2 



ST 



ADDRO 

ADDR 1 

ADDR2 

ADDR 3 
ADDR 4 

ADDR 5 
ADDR 6 
ADDR 7 



PREVCODE1 , 


cm 


\ SD 


PREVCODE2 ', 


CH2 


' CD 


PREVCODE3 i 


CH3 


i SD 


PREVCODE4 | 


CH4 


| SD 


PREVCODE5 i 


CH5 


, CD 


PREVCODE6 ' 


, CH6 


1 CD 


PREVCODE7 


i CH7 


CD 


PREVCODE8 


1 CH8 


' CD 


NULL 


• NULL 


i FREE/ PD 



^PREVCODE8, CH8 



FIG. 22(C) 



CHAR 



ST 



ADDRO 




PREVCODE1 , 


CH1 J 


CD 


ADDR 1 




PREVCODE2 I 


CH2 ' 


FREE/ PD 


ADDR 2 




PREVCODE3 I 


CH3 l 


CD 


ADDR 3 




PREVCODE4 J 


CH4 ! 


CD 


ADDR 4 




PREVCODE5 • 


CH5 i 


FREE/ PD 


ADDR 5 




PREVCODE6 ; 


CH6 J 


FREE/ PD 


ADDR 6 




PREVCODE7 I 


CH7 i 


FREE/ PD 


ADDR 7 




PREVCODE8 ! 


CH8 1 


FREE/ PD 






FIG. 22(D) 








CODE 


CHAR 


ST 


ADDR I 


D 


PREVCODE 1 ' 


CH1 1 


SD 


ADDR 1 


PREVCODE 9 


CH9 i 


CD 


ADDR 2 


PREVCODE 3 


i CH3 1 


CD 


ADDR 3 


PREVCODE 4 


\ CH4 ; 


CD 


ADDR 4 


PREVCODE 5 


» CH5 ' 


SD 


ADDR 5 


PREVCODE 6 


| CH6 


FREE/ PD 


ADDR 


6 


PREVCODE 7 


i CH7 


I FREE/ PD 


ADDR 


7 


PREVCODE 8 


1 CH8 


1 FREE/ PD 






FIG. 22(E) 



PREVCODE1 , CH1 
PREVCODE9, CH9 
PREVCODE3, CH3 
PREVCODE4, CH4 



49 



EP 0 666 651 A2 



472 



UPDATE STATUS 
BITS & PREP FOR 
NEW CHAR 







INITIALIZATION 




f 



2 



452 



READ DATA 




456 



454 



458 



SEARCH DICTIONARY 




462 




OUTPUT PREVIOUS MATCH 






^-464 
f / 




CLEANUP 





466 



OUTPUT PREVIOUS MATCH 



468 



UPDATE DICTIONARY 



470 



PREP FOR NEW 
STRING 



FIG. 23 



50 



EP 0 666 651 A2 



474 



RESET DICTIONARY & OUTPUT FORMATTER, 
CODE_SIZE= 9, LAST_CODE_BUILT ■ 
INVALID, DEPTH = 0. UNSET SWAP_FLAG 



480 



2 



476 



PREVCODE =CH. 
DEPTH = 1 



J CH - READ INPUT BYTE* | ~ 

Y 



486 




482 



FOUNDCODE « 
PRIORITY ENCODE 
SEARCH OF CAM FOR 
(PREVCODE, CH. *) 



SET STATUS BITS AT 
FOUNDCODE TO SD, 
PREVCODE* 
FOUNDCODE, DEPTH+1 



490 



MATCH & LAST 
CODE BUILT # 
FOUNDCODE & 
DEPTH<MAXDEPTH 



From Block 518 
(See FIG. 24(C)] 



To Block 492 
[See FIG. 24(B)! 



To Block 492 
[See FIG. 24(B)! 



FIG. 24(A) 



51 



EP 0 666 651 A2 



From Block 490 
[See FIG. 24(A)J ° 



From Block 486 
[See FIG. 24(A)] 



o 



494 



492 




N 



PACK GROW 
CODE INTO 
BYTES AND 
OUTPUT, 
CODE_SIZE+1 



496 



PACK PREVCODE INTO BYTES, 
OUTPUT ALL POSSIBLE BYTES 




>r To Block 506 
• (See FIG. 24(C)] 



FIG. 24(B) 



52 



EP 0 666 651 A2 



To Block 476 
{See FIG. 24(A)] 



From Block 502 
[See FIG. 24(B)] 




N 



508 





Y 

r 


NEXTCODE= PRIORITY 
ENCODE SEARCH OF CAM 
FOR (V. FREE/PD) 


510 — . > 


r 



512 




514 





Y 

r 


PUT (PREVCODE, CH, CD,) IN 
DICTIONARY AT NEXTCODE. 
LAST CODE BUILT= NEXTCODE 


1 


r 



(> 



518 



r 



516 

, ■ J , 

- | LAST_CODE_BUILT= INVALID I 



PREVCO 
DEP1 


DE= CH, 
_ H= 1 







FIG. 24(C) 



53 



EP 0 666 651 A2 



CONTROL 
CODE EV AL- 
AND 
EXECUTE 






r 


INITIALIZATION 




r 



520 



READ DATA 



522 




OUTPUT STRING AND 
UPDATE ST BITS 



UPDATE DICTIONARY 



530 



532 



PREP FOR NEXT CODE 



FIG. 25 



54 



EP 0 666 651 A2 




534 



[RESET CAM, STATE MACHINE & INPUT 
UNFORMATTED CODE.SIZE = 9, PREVDEPTH - 0 




• From Block 574 
[See FIG. 26(C)] 



FIG. 26(A) 



55 



EP 0 666 651 A2 



From Block 540 
(See FIG. 26(A)] 




DEPTH o 1, TCODE = INCODE 




PUT TCODE ON STACK, FIRST_CHAR - 
TCODE, TDEPTH « DEPTH 



554 

+ S~ 

PUSH CH AT TCODE ON STACK, 
SET STATUS BITS AT TCODE TO 
SD, DEPTH +1. TCODE = CODE 
AT TCODE 




FIG. 26(B) 

56 



EP 0 666 651 A2 



From Block 553 
[See FIG. 26(B)) 



562 



560 



OUTPUT (POP 
STACK). TDEPTH-- 



N 




NEXTCODE = PRIORITY ENCODE 
SEARCH OF CAM FOR (\\ FREE/PD) 



566 
J 



570 




572 



568 



SWAP 
DICTIONARIES 



PUT (PREVCODE, FIRST_CHAR, CD) 
IN CAM AT NEXTCODE 



O 



PREVCODE = INCODE, 
PREVDEPTH = DEPTH 



574 



y To Block 536 
(See FIG. 26(A)J 



FIG. 26(C) 

57 



THIS PAGE BLANK (uspto) 



wiiiiiHiiiiiggiiinii 

id EP 0 666 651 A3 

(12) EUROPEAN PATENT APPLICATION 

(88) Date of publication A3: (51) Int CI. 6 : HO 3M 7/30 

22.05.1996 Bulletin 1996/21 

(43) Date of publication A2: 

09.08.1995 Bulletin 1995/32 

(21) Application number: 95300346.4 



(22) Date of filing: 20.01.1995 



(84) Designated Contracting States: 


• Tobin, Jeffrey P. 


DE FR GB IT 


Albany, OR 97321 (US) 




• Seroussi, Gadiel 


(30) Priority: 07.02.1994 US 192878 


Cupertino, California 95014 (US) 


(71) Applicant: Hewlett-Packard Company 


(74) Representative: Colgan, Stephen James et al 


Palo Alto, California 94304 (US) 


CARPMAELS & RANSFORD 




43 Blooms bury Square 


(72) Inventors: 


London WC1 A 2RA (GB) 


• Clark, Airell R. 




Albany, OR 97321-9335 (US) 





(54) Apparatus and method for lempel ziv data compression with management of multiple 
dictionaries in content addressable memory 




Europaisches Patentamt 
European Patent Office 
Office europeen des brevets 



CO 
< 

in 

<£> 

CD 
<D 
CD 

O 

CL 
LLI 



(57) A class of lossless data compression algo- 
rithms use a memory-based dictionary (312) of finite 
size to facilitate the compression and decompression of 
data. To reduce the loss in data compression caused by 
dictionary resets, a standby dictionary (328) is used to 
store a subset of encoded data entries previously stored 
in a current dictionary. In a second aspect of the inven- 
tion, data is compressed/decompressed according to 
the address location of data entries contained within a 
dictionary built in a content addressable memory (CAM) 
(312). In a third aspect of the invention, the minimum 
memory/high compression capacity of the standby dic- 
tionary scheme is combined with the fast single-cycle 
per character encoding/decoding capacity of the CAM 
circuit. In a fourth aspect of the invention, a selective 
overwrite dictionary swapping technique is used to allow 
all data entries to be used at all times for encoding char- 
acter strings (450-472). 
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